Abstract
Speech is a highly redundant signal. The redundant nature of speech is important for providing reliable communication over air pathways. A large part of this redundancy is useless for speech communication over digital channels. Speech coding aims at minimizing the information rate needed to reproduce a speech signal with specified fidelity. In this paper, we discuss factors that influence the design of efficient speech coders. The encoding and decoding processes invariably introduce error (noise and distortion) in the speech signal. Inability of the human ear to hear certain kinds of distortions in the speech signal plays a crucial role in producing high-quality speech at low bit rates. The physical difference between the waveforms of a given speech signal and its coded replica generally does not tell us much about the subjective quality of the coded signal. A signal-to-noise ratio as small as 10 dB can be tolerated in the coded signal provided the errors are distributed both in time and frequency domains where they are least audible. Recent work on auditory masking has provided us with new insights for optimizing the performance of speech coders. This paper reviews this work and discusses new speech coding methods that attempt to maximize the perceptual similarity between the original speech signal and its coded replica. These new methods make it possible to reproduce speech signals at very low bit rates with little or no audible distortion.

This publication has 9 references indexed in Scilit: