The name ambiguity problem is especially challenging in the field of bibliographic digital libraries. The problem is amplified when names are collected from heterogeneous sources. This is the case in the Scholarometer system, which performs bibliometric analysis by cross-correlating author names in user queries with those retrieved from digital libraries. The uncontrolled nature of user-generated annotations is very valuable, but creates the need to detect ambiguous names. Our goal is to detect ambiguous names at query time by mining digital library annotation data, thereby decreasing noise in the bibliometric analysis. We explore three kinds of heuristic features based on citations, metadata, and crowdsourced topics in a supervised learning framework. The proposed approach achieves almost 80% accuracy. Finally, we compare the performance of ambiguous author detection in Scholarometer using Google Scholar against a baseline based on Microsoft Academic Search.
This is the direct link to the paper. Check out the project’s website.