Abstract
This paper describes a data-driven method for hierarchical clustering of words in which a large vocabulary of English words is clustered bottom-up, with respect to corpora ranging in size from 5 to 50 million words, using a greedy algorithm that tries to minimize average loss of mutual information of adjacent classes. The resulting hierarchical clusters of words are then naturally transformed to a bit-string representation of (i.e. word bilts for) all the words in the vocabulary. Introducing word bits into the ATR Decision-Tree POS Tagger is shown to significantly reduce the tagging error rate. Portability of word bits from one domain to another is also disscussed.

This publication has 0 references indexed in Scilit: