Importance of tonal envelope cues in Chinese speech recognition
- 1 July 1998
- journal article
- conference paper
- Published by Acoustical Society of America (ASA) in The Journal of the Acoustical Society of America
- Vol. 104 (1) , 505-510
- https://doi.org/10.1121/1.423251
Abstract
Recent studies have shown that temporal waveform envelope cues can provide significant information for English speech recognition. This study investigated the use of temporal envelope cues in a tonal language: Mandarin Chinese. In this study, the speech was divided into several frequency analysis bands; the amplitude envelope was extracted from each band by half-wave rectification and low-pass filtering and was used to modulate a noise of the same bandwidth as the analysis band. These manipulations preserved temporal and amplitude cues in each frequency band, but removed the spectral detail within each band. Chinese vowels, consonants, tones and sentences were identified by 12 native Chinese-speaking listeners with 1, 2, 3, and 4 noise bands. The results showed that the recognition score of vowels, consonants, and sentences increased monotonically with the number of bands, a pattern similar to that observed in English speech recognition. In contrast, tones were consistently recognized at about 80% correct level, independent of the number of bands. This high level of tone recognition produced a significant difference in the open-set sentence recognition between Chinese (11.0%) and English (2.9%) for the one-band condition where no spectral information was available. The data also revealed that, with primarily temporal cues, the falling–rising tone (tone 3) and the falling tone (tone 4) were more easily recognized than the flat tone (tone 1) and the rising tone (tone 2). This differential pattern in tone recognition resulted in a similar pattern in word recognition: words having either tone 3 or 4 were more likely to be recognized while words having tone 1 and 2 were not. The quantitative role of tones in Chinese speech recognition was further explored using a power-function model and found to play a significant role in relating phoneme recognition to sentence recognition.Keywords
This publication has 9 references indexed in Scilit:
- Speech Recognition with Primarily Temporal CuesScience, 1995
- Relations among different measures of speech reception in subjects using a cochlear implantThe Journal of the Acoustical Society of America, 1992
- Temporal information in speech: acoustic, auditory and linguistic aspectsPhilosophical Transactions Of The Royal Society B-Biological Sciences, 1992
- Information for Mandarin Tones in the Amplitude Contour and in Brief SegmentsPhonetica, 1992
- Mathematical treatment of context effects in phoneme and word recognitionThe Journal of the Acoustical Society of America, 1988
- Speech waveform envelope cues for consonant recognitionThe Journal of the Acoustical Society of America, 1987
- Reference Signal for Signal Quality StudiesThe Journal of the Acoustical Society of America, 1968
- Pitch of the ResidueThe Journal of the Acoustical Society of America, 1962
- Effects of Differentiation, Integration, and Infinite Peak Clipping upon the Intelligibility of SpeechThe Journal of the Acoustical Society of America, 1948