Multi-class composite N-gram based on connection direction
- 1 January 1999
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- Vol. 1 (15206149) , 533-536 vol.1
- https://doi.org/10.1109/icassp.1999.758180
Abstract
A new word-clustering technique is proposed to efficiently build statistically salient class 2-grams from language corpora. By splitting word neighboring characteristics into word-preceding and following directions, multiple (two-dimensional) word classes are assigned to each word, In each side, word classes are merged into larger clusters independently according to preceding or following word distributions. This word-clustering can provide more efficient and statistically reliable word clusters. Further, we extend it to a multi-class composite N-gram that unit is a multi-class 2-gram and joined word. The multi-class composite N-gram showed better performance both in perplexity and recognition rates with one thousandth smaller size than conventional word 2-grams.Keywords
This publication has 3 references indexed in Scilit:
- Variable-order N-gram generation by word-class splitting and consecutive word groupingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Spontaneous dialogue speech recognition using cross-word context constrained word graphsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- HMM topology design using maximum likelihood successive state splittingComputer Speech & Language, 1997