Importance of temporal-envelope cues in consonant recognition

Abstract
The role of different modulation frequencies in the speech envelope were studied by means of the manipulation of vowel–consonant–vowel (VCV) syllables. The envelope of the signal was extracted from the speech and the fine-structure was replaced by speech-shaped noise. The temporal envelopes in every critical band of the speech signal were notch filtered in order to assess the relative importance of different modulation frequency regions between 0 and 20 Hz. For this purpose notch filters around three center frequencies (8, 12, and 16 Hz) with three different notch widths (4-, 8-, and 12-Hz wide) were used. These stimuli were used in a consonant-recognition task in which ten normal-hearing subjects participated, and their results were analyzed in terms of recognition scores. More qualitative information was obtained with a multidimensional scaling method (INDSCAL) and sequential information analysis (SINFA). Consonant recognition is very robust for the removal of certain modulation frequency areas. Only when a wide notch around 8 Hz is applied does the speech signal become heavily degraded. As expected, the voicing information is lost, while there are different effects on plosiveness and nasality. Even the smallest filtering has a substantial effect on the transfer of the plosiveness feature, while on the other hand, filtering out only the low-modulation frequencies has a substantial effect on the transfer of nasality cues.

This publication has 10 references indexed in Scilit: