Reference reconciliation in complex information spaces
Top Cited Papers
- 14 June 2005
- proceedings article
- Published by Association for Computing Machinery (ACM)
Abstract
Reference reconciliation is the problem of identifying when different references (i.e., sets of attribute values) in a dataset correspond to the same real-world entity. Most previous literature assumed references to a single class that had a fair number of attributes (e.g., research publications). We consider complex information spaces: our references belong to multiple related classes and each reference may have very few attribute values. A prime example of such a space is Personal Information Management, where the goal is to provide a coherent view of all the information on one's desktop.Our reconciliation algorithm has three principal features. First, we exploit the associations between references to design new methods for reference comparison. Second, we propagate information between reconciliation decisions to accumulate positive and negative evidences. Third, we gradually enrich references by merging attribute values. Our experiments show that (1) we considerably improve precision and recall over standard methods on a diverse set of personal information datasets, and (2) there are advantages to using our algorithm even on a standard citation dataset benchmark.Keywords
This publication has 14 references indexed in Scilit:
- Exploiting relationships for domain-independent data cleaningPublished by Society for Industrial & Applied Mathematics (SIAM) ,2005
- Adaptive name matching in information integrationIEEE Intelligent Systems, 2003
- Adaptive duplicate detection using learnable string similarity measuresPublished by Association for Computing Machinery (ACM) ,2003
- Stuff I've seenPublished by Association for Computing Machinery (ACM) ,2003
- Robust and efficient fuzzy match for online data cleaningPublished by Association for Computing Machinery (ACM) ,2003
- Haystack: A Platform for Authoring End User Semantic Web ApplicationsPublished by Springer Nature ,2003
- Learning domain-independent string transformation weights for high accuracy object identificationPublished by Association for Computing Machinery (ACM) ,2002
- Interactive deduplication using active learningPublished by Association for Computing Machinery (ACM) ,2002
- Hardening soft information sourcesPublished by Association for Computing Machinery (ACM) ,2000
- Efficient clustering of high-dimensional data sets with application to reference matchingPublished by Association for Computing Machinery (ACM) ,2000