Low-Level Information and High-Level Perception: The Case of Speech in Noise

Abstract
Auditory information is processed in a fine-to-crude hierarchical scheme, from low-level acoustic information to high-level abstract representations, such as phonological labels. We now ask whether fine acoustic information, which is not retained at high levels, can still be used to extract speech from noise. Previous theories suggested either full availability of low-level information or availability that is limited by task difficulty. We propose a third alternative, based on the Reverse Hierarchy Theory (RHT), originally derived to describe the relations between the processing hierarchy and visual perception. RHT asserts that only the higher levels of the hierarchy are immediately available for perception. Direct access to low-level information requires specific conditions, and can be achieved only at the cost of concurrent comprehension. We tested the predictions of these three views in a series of experiments in which we measured the benefits from utilizing low-level binaural information for speech perception, and compared it to that predicted from a model of the early auditory system. Only auditory RHT could account for the full pattern of the results, suggesting that similar defaults and tradeoffs underlie the relations between hierarchical processing and perception in the visual and auditory modalities. One of the central questions in sensory neuroscience is the determination of the maximal amount of task-relevant information that is encoded in our brain. It is often assumed that all of this information is available for making perceptual decisions. We now show that this assumption does not hold generally. We find that when discriminating or understanding speech masked by noise, only the information that is represented at higher cortical areas is generally accessible for perception. Thus, when we need to decide whether the speaker said “day” or “night,” we are likely to succeed in this discrimination. However, when fine discriminations are required (e.g., “day” vs. “bay”), the information regarding the fine spectral and temporal details, which are necessary to discriminate these two words, can be fully utilized only under special conditions. These conditions include, for example, systematic repetitions of the stimuli, as often done in psychoacoustic experiments, or when one eliminates the need for comprehension and focuses on mere identification. These conditions are nonecological, and are not afforded in most daily situations.

This publication has 89 references indexed in Scilit: