A novel word clustering algorithm based on latent semantic analysis
- 24 December 2002
- proceedings article
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- Vol. 1, 172-175
- https://doi.org/10.1109/icassp.1996.540318
Abstract
A new approach is proposed for the clustering of words in a given vocabulary. The method is based on a paradigm first formulated in the context of information retrieval, called latent semantic analysis. This paradigm leads to a parsimonious vector representation of each word in a suitable vector space, where familiar clustering techniques can be applied. The distance measure selected in this space arises naturally from the problem formulation. Preliminary experiments indicate that, the clusters produced are intuitively satisfactory. Because these clusters are semantic in nature, this approach may prove useful as a complement to conventional class-based statistical language modeling techniques.Keywords
This publication has 9 references indexed in Scilit:
- Clustering word category based on binomial posteriori co-occurrence distributionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Context-Dependent Vector Clustering for Speech RecognitionPublished by Springer Nature ,1996
- The hub and spoke paradigm for CSR evaluationPublished by Association for Computational Linguistics (ACL) ,1994
- Automatic word classification using simulated annealingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1993
- An automatic technique to include grammatical and morphological information in a trigram-based statistical language modelPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1992
- Improving the retrieval of information from external sourcesBehavior Research Methods, Instruments & Computers, 1991
- Indexing by latent semantic analysisJournal of the American Society for Information Science, 1990
- SELF-ORGANIZED LANGUAGE MODELING FOR SPEECH RECOGNITIONPublished by Elsevier ,1990
- Estimation of probabilities from sparse data for the language model component of a speech recognizerIEEE Transactions on Acoustics, Speech, and Signal Processing, 1987