Ambiguous author query detection

Ambiguous author query detectionThe name ambi­gu­ity prob­lem is espe­cially chal­leng­ing in the field of bib­li­o­graphic dig­i­tal libraries. The prob­lem is ampli­fied when names are col­lected from het­ero­ge­neous sources. This is the case in the Schol­arom­e­ter sys­tem, which per­forms bib­lio­met­ric analy­sis by cross-correlating author names in user queries with those retrieved from dig­i­tal libraries. The uncon­trolled nature of user-generated anno­ta­tions is very valu­able, but cre­ates the need to detect ambigu­ous names. Our goal is to detect ambigu­ous names at query time by min­ing dig­i­tal library anno­ta­tion data, thereby decreas­ing noise in the bib­lio­met­ric analy­sis. We explore three kinds of heuris­tic fea­tures based on cita­tions, meta­data, and crowd­sourced top­ics in a super­vised learn­ing frame­work. The pro­posed approach achieves almost 80% accu­racy. Finally, we com­pare the per­for­mance of ambigu­ous author detec­tion in Schol­arom­e­ter using Google Scholar against a base­line based on Microsoft Aca­d­e­mic Search.

This is the direct link to the paper. Check out the project’s web­site.