Taxonomy generation for text segments
- 1 October 2005
- journal article
- research article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Information Systems
- Vol. 23 (4) , 363-396
- https://doi.org/10.1145/1095872.1095873
Abstract
It is crucial in many information systems to organize short text segments, such as keywords in documents and queries from users, into a well-formed taxonomy. In this article, we address the problem of taxonomy generation for diverse text segments with a general and practical approach that uses the Web as an additional knowledge source. Unlike long documents, short text segments typically do not contain enough information to extract reliable features. This work investigates the possibilities of using highly ranked search-result snippets to enrich the representation of text segments. A hierarchical clustering algorithm is then designed for creating the hierarchical topic structure of text segments. Text segments with close concepts can be grouped together in a cluster, and relevant clusters linked at the same or near levels. Different from traditional clustering algorithms, which tend to produce cluster hierarchies with a very unnatural shape, the algorithm tries to produce a more natural and comprehensive tree hierarchy. Extensive experiments were conducted on different domains of text segments, including subject terms, people names, paper titles, and natural language questions. The obtained experimental results have shown the potential of the proposed approach, which provides a basis for the in-depth analysis of text segments on a larger scale and is believed able to benefit many information systems.Keywords
This publication has 11 references indexed in Scilit:
- Enriching Web taxonomies through subject categorization of query terms from search engine logsDecision Support Systems, 2003
- Query clustering using user logsACM Transactions on Information Systems, 2002
- AN INTERACTIVE TOOL FOR THE RAPID DEVELOPMENT OF KNOWLEDGE BASESInternational Journal on Artificial Intelligence Tools, 2001
- Fast and effective text mining using linear-time document clusteringPublished by Association for Computing Machinery (ACM) ,1999
- Query expansion using local and global document analysisPublished by Association for Computing Machinery (ACM) ,1996
- Mathematical Classification and ClusteringPublished by Springer Nature ,1996
- Distributional clustering of English wordsPublished by Association for Computational Linguistics (ACL) ,1993
- Recent trends in hierarchic document clustering: A critical reviewInformation Processing & Management, 1988
- Term-weighting approaches in automatic text retrievalInformation Processing & Management, 1988
- An Examination of Procedures for Determining the Number of Clusters in a Data SetPsychometrika, 1985