On integrating catalogs
- 1 April 2001
- proceedings article
- Published by Association for Computing Machinery (ACM)
- p. 603-612
- https://doi.org/10.1145/371920.372163
Abstract
We address the problem of integrating documents from different sources into a master catalog. This problem is pervasive in web marketplaces and portals. Current technology for automating this process consists of building a classifier that uses the categorization of documents in the master catalog to construct a model for predicting the category of unknown documents. Our key insight is that many of the data sources have their own categorization, and classification accuracy can be improved by factoring in the implicit information in these source categorizations. We show how a Naive Bayes classification can be enhanced to incorporate the similarity information present in source catalogs. Our analysis and empirical evaluation show substantial improvement in the accuracy of catalog integration.Keywords
This publication has 8 references indexed in Scilit:
- Hierarchical classification of Web contentPublished by Association for Computing Machinery (ACM) ,2000
- Practical evaluation of IR within automated classification systemsPublished by Association for Computing Machinery (ACM) ,1999
- A patent search and classification systemPublished by Association for Computing Machinery (ACM) ,1999
- MailCatPublished by Association for Computing Machinery (ACM) ,1999
- SONIAPublished by Association for Computing Machinery (ACM) ,1998
- Learning and Revising User Profiles: The Identification of Interesting Web SitesMachine Learning, 1997
- Interface agents that learn an investigation of learning issues in a mail agent interfaceApplied Artificial Intelligence, 1997
- Agents that reduce work and information overloadCommunications of the ACM, 1994