Hierarchical clustering of words

Open Access

1 January 1996

proceedings article
Published by Association for Computational Linguistics (ACL)

Vol. 2, 1159-1162
https://doi.org/10.3115/993268.993390

Abstract

This paper describes a data-driven method for hierarchical clustering of words in which a large vocabulary of English words is clustered bottom-up, with respect to corpora ranging in size from 5 to 50 million words, using a greedy algorithm that tries to minimize average loss of mutual information of adjacent classes. The resulting hierarchical clusters of words are then naturally transformed to a bit-string representation of (i.e. word bilts for) all the words in the vocabulary. Introducing word bits into the ATR Decision-Tree POS Tagger is shown to significantly reduce the tagging error rate. Portability of word bits from one domain to another is also disscussed.

Keywords

WORD BILTS
MILLION WORD
ATR DECISION-TREE POS TAGGER
HIERARCHICAL CLUSTERING
ADJACENT CLASS
LARGE VOCABULARY
WORD BIT
HIERARCHICAL CLUSTER
INTRODUCING WORD BIT
ENGLISH WORD
DECISION TREE
BOTTOM UP
ERROR RATE

This publication has 0 references indexed in Scilit: