The Modulation Transfer Function for Speech Intelligibility
Top Cited Papers
Open Access
- 6 March 2009
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLoS Computational Biology
- Vol. 5 (3) , e1000302
- https://doi.org/10.1371/journal.pcbi.1000302
Abstract
We systematically determined which spectrotemporal modulations in speech are necessary for comprehension by human listeners. Speech comprehension has been shown to be robust to spectral and temporal degradations, but the specific relevance of particular degradations is arguable due to the complexity of the joint spectral and temporal information in the speech signal. We applied a novel modulation filtering technique to recorded sentences to restrict acoustic information quantitatively and to obtain a joint spectrotemporal modulation transfer function for speech comprehension, the speech MTF. For American English, the speech MTF showed the criticality of low modulation frequencies in both time and frequency. Comprehension was significantly impaired when temporal modulations <12 Hz or spectral modulations <4 cycles/kHz were removed. More specifically, the MTF was bandpass in temporal modulations and low-pass in spectral modulations: temporal modulations from 1 to 7 Hz and spectral modulations <1 cycles/kHz were the most important. We evaluated the importance of spectrotemporal modulations for vocal gender identification and found a different region of interest: removing spectral modulations between 3 and 7 cycles/kHz significantly increases gender misidentifications of female speakers. The determination of the speech MTF furnishes an additional method for producing speech signals with reduced bandwidth but high intelligibility. Such compression could be used for audio applications such as file compression or noise removal and for clinical applications such as signal processing for cochlear implants. The sound signal of speech is rich in temporal and frequency patterns. These fluctuations of power in time and frequency are called modulations. Despite their acoustic complexity, spoken words remain intelligible after drastic degradations in either time or frequency. To fully understand the perception of speech and to be able to reduce speech to its most essential components, we need to completely characterize how modulations in amplitude and frequency contribute together to the comprehensibility of speech. Hallmark research distorted speech in either time or frequency but described the arbitrary manipulations in terms limited to one domain or the other, without quantifying the remaining and missing portions of the signal. Here, we use a novel sound filtering technique to systematically investigate the joint features in time and frequency that are crucial for understanding speech. Both the modulation-filtering approach and the resulting characterization of speech have the potential to change the way that speech is compressed in audio engineering and how it is processed in medical applications such as cochlear implants.Keywords
This publication has 41 references indexed in Scilit:
- Acoustic Features of Rhesus Vocalizations and Their Representation in the Ventrolateral Prefrontal CortexJournal of Neurophysiology, 2007
- Tuning for spectro-temporal modulations as a mechanism for auditory discrimination of natural soundsNature Neuroscience, 2005
- Chimaeric sounds reveal dichotomies in auditory perceptionNature, 2002
- The Acoustics of Speech Communication: Fundamentals, Speech Perception Theory, and TechnologyThe Journal of the Acoustical Society of America, 2000
- Spectro-temporal modulation transfer functions and speech intelligibilityThe Journal of the Acoustical Society of America, 1999
- Speech Recognition with Primarily Temporal CuesScience, 1995
- Effect of spectral envelope smearing on speech reception. IThe Journal of the Acoustical Society of America, 1992
- Signal estimation from modified short-time Fourier transformIEEE Transactions on Acoustics, Speech, and Signal Processing, 1984
- Human Discrimination of Auditory DurationThe Journal of the Acoustical Society of America, 1962
- Control Methods Used in a Study of the VowelsThe Journal of the Acoustical Society of America, 1952