ANGIE: a new framework for speech analysis based on morpho-phonological modelling
- 24 December 2002
- proceedings article
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- Vol. 1, 110-113
- https://doi.org/10.1109/icslp.1996.607049
Abstract
This paper describes a new system for speech analysis, ANGIE, which characterizes word substructure in terms of a trainable gram- mar. ANGIE capture morpho-phonemic and phonological phenom- ena through a hierarchical framework. The terminal categories can be alternately letters or phone units, yielding a reversible letter-to- sound/sound-to-letter system. In conjunction with a segment net- work and acoustic phone models, the system can produce phonemic- to-phonetic alignments for speech waveforms. For speech recog- nition, ANGIE uses a one-pass bottom-up best-first search strategy. Evaluated in the ATIS domain, ANGIEachieved a phone error rate of 36%, as compared with 40% achieved with a baseline phone-bigram based recognizer under similar conditions. A NGIEpotentially offers many attractive features, including dynamic vocabulary adaptation, as well as a framework for handling unknown words. 1. OVERVIEW In this paper we propose a methodology for incorporating multi- ple sublexical linguistic phonemena (including phonology, syllab- ification and morphology), into a single framework for represent- ing speech and language. Together with a trainable probabilistic parser, this unified framework provides a viable paradigm for mul- tiple tasks - letter-to-sound/sound-to-letter generation, phoneme-to- phone alignment, and speech recognition. We hope that such a uni- fied framework promotes shared usage of the sublexical constraints amongst the different applications, which should facilitate the search processes and also make it easier to deal with out-of-vocabulary words and to add new words dynamically. A preliminary system based on this paradigm, which we call ANGIE, has been under development in our group over the past year. Context-free rules are written by hand to generate a hierarchical tree representation, as illustrated in Figure 1. These trees are used to train the probabilities of the parser, which are later used in each of our three applications. The structure consists of five regular layers below the root SENTENCE node. Each word in the sentence is rep- resented by a WORD node in the second layer. The remaining lay- ers capture, in order, morphology, syllabification, phonemes, andKeywords
This publication has 3 references indexed in Scilit:
- SAPPHIRE: an extensible speech analysis and recognition tool based on Tcl/TkPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- The SUMMIT speech recognition system: phonological modelling and lexical accessPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Expanding the scope of the ATIS taskPublished by Association for Computational Linguistics (ACL) ,1994