Simulation of talking faces in the human brain improves auditory speech recognition

Open Access

6 May 2008

journal article
research article
Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences

Vol. 105 (18) , 6747-6752
https://doi.org/10.1073/pnas.0710826105

Abstract

Human face-to-face communication is essentially audiovisual. Typically, people talk to us face-to-face, providing concurrent auditory and visual input. Understanding someone is easier when there is visual input, because visual cues like mouth and tongue movements provide complementary information about speech content. Here, we hypothesized that, even in the absence of visual input, the brain optimizes both auditory-only speech and speaker recognition by harvesting speaker-specific predictions and constraints from distinct visual face-processing areas. To test this hypothesis, we performed behavioral and neuroimaging experiments in two groups: subjects with a face recognition deficit (prosopagnosia) and matched controls. The results show that observing a specific person talking for 2 min improves subsequent auditory-only speech and speaker recognition for this person. In both prosopagnosics and controls, behavioral improvement in auditory-only speech recognition was based on an area typically involved in face-movement processing. Improvement in speaker recognition was only present in controls and was based on an area involved in face-identity processing. These findings challenge current unisensory models of speech processing, because they show that, in auditory-only speech, the brain exploits previously encoded audiovisual correlations to optimize communication. We suggest that this optimization is based on speaker-specific audiovisual internal models, which are used to simulate a talking face.

Keywords

This publication has 41 references indexed in Scilit:

Optimal Sensorimotor Integration in Recurrent Cortical Networks: A Neural Implementation of Kalman Filters
Journal of Neuroscience, 2007
Exploring the role of characteristic motion when learning new faces
The Quarterly Journal of Experimental Psychology, 2007
Hereditary Prosopagnosia: the First Case Series
Cortex, 2007
The fusiform face area: a cortical region specialized for the perception of faces
Philosophical Transactions Of The Royal Society B-Biological Sciences, 2006
Images, Frames, and Connectionist Hierarchies
Neural Computation, 2006
Implicit Multisensory Associations Influence Voice Recognition
PLoS Biology, 2006
First report of prevalence of non‐syndromic hereditary prosopagnosia (HPA)
American Journal of Medical Genetics Part A, 2006
Voice Recognition and Cross-Modal Responses to Familiar Speakers' Voices in Prosopagnosia
Cerebral Cortex, 2005
An Internal Model for Sensorimotor Integration
Science, 1995
Acoustic Theory of Speech Production: With Calculations Based on X-Ray Studies of Russian Articulations
Slavic and East European Journal, 1961