Distributional Clustering of English Words

Preprint

22 August 1994

preprint
Published by arXiv in arXiv

https://doi.org/10.48550/arXiv.cmp-lg/9408011

Abstract

We describe and experimentally evaluate a method for automatically clustering words according to their distribution in particular syntactic contexts. Deterministic annealing is used to find lowest distortion sets of clusters. As the annealing parameter increases, existing clusters become unstable and subdivide, yielding a hierarchical ``soft'' clustering of the data. Clusters are used as the basis for class models of word coocurrence, and the models evaluated with respect to held-out test data.

Keywords

DISTRIBUTIONAL CLUSTERING
AUTOMATICALLY CLUSTERING
CLUSTERING OF THE DATA
SYNTACTIC CONTEXTS
MODELS EVALUATED
ENGLISH WORDS
ANNEALING
CLUSTERS
WORDS

All Related Versions

Version 1, 1994-08-22, ArXiv

This publication has 0 references indexed in Scilit: