Autonomous citation matching
- 1 April 1999
- conference paper
- Published by Association for Computing Machinery (ACM)
- p. 392-393
- https://doi.org/10.1145/301136.301255
Abstract
Advances in computational resources and the commu- nications infrastructure, and the rapid rise of the World Wide Web, have led to the increasingly widespread avail- ability of scientific papers in electronic form. Scientific papers usually contain citations to previous work, and in- dices of these citations are valuable for literature search , analysis, and evaluation. Current citation indices of the scientific literature are constructed using manual effort and are typically expensive. Part of the reason for using manual effort is the great variability of citation syntax - i t can be difficult to autonomously determine if two citations refer to the same article because citations can be written in many different formats. We present machine learning techniques that identify variant forms of citations to the same paper. A number of algorithms are presented. An algorithm based on word and phrase matching is found to perform best, and is sufficiently accurate for unassisted use in an autonomous citation indexing system. An al- gorithm based on a string edit distance performs poorly in comparison. A computationally efficient subfield algo- rithm is also presented. The accuracy and efficiency of all algorithms is quantitatively compared on a number of datasets.Keywords
This publication has 5 references indexed in Scilit:
- CiteSeerPublished by Association for Computing Machinery (ACM) ,1998
- CiteSeerPublished by Association for Computing Machinery (ACM) ,1998
- Applications of approximate word matching in information retrievalPublished by Association for Computing Machinery (ACM) ,1997
- Citation linkingPublished by Association for Computing Machinery (ACM) ,1997
- ON THE SPECIFICATION OF TERM VALUES IN AUTOMATIC INDEXINGJournal of Documentation, 1973