Subjective speech-to-noise ratio as a measure of speech quality for digital waveform coders

Abstract
The ultimate performance measure for evaluating voice communication systems is the subjective quality of the received speech. Modern digital speech-coding techniques achieve high intelligibility and significant transmission economies. The high level of speech intelligibility is a necessary but insufficient condition for user acceptance of the systems. Quality also must meet acceptability criteria. No adequate single measure of overall speech quality has yet been developed. This work takes a utilitarian approach in attempting to satisfy the urgent requirement for a practical measurement method. The subjective speech-to-noise-ratio (SNR), derived from the forced-choice pair-comparison test using the psychometric analysis procedure commonly used in the method of constants, was evaluated. A speech signal degraded by varying amounts of multiplicative white noise was selected as the reference system in the test. Seven types of digital speech coders were simulated and evaluated, including log-PCM [pulse code modulation], ADM [adaptive delta modulation], ADPCM [adaptive differential pulse code modulation] coders with variable or fixed predictor, APC [adaptive predictor coder], residual-excited and pitch-excited LP [linear predictor] coders (RELP [residual-excited linear prediction coder] and LPC [pitch excited linear predictor coder]). Configurations (13) of these coders covering the transmission bit rates of 2.4-64 kb[kilobit]/s were included. Pair-comparison tests were conducted in 2 separate sessions 14 mo. apart using different groups of speakers and listeners. The subjective SNR estimated from 13 coder configurations ranged from 7-40 dB and well represented overall speech quality in a single dimension. No significant speaker and listener variation was found for a wide range of waveform coders. The subjective SNR estimate was highly reproducible with different speakers and listeners. Arbitrary selection of as few as 5 listeners yielded a stable subjective SNR estimate for the waveform coders. Highly significant listener variation was found for the narrow-band digital vocoders (RELP and LPC). This listener variability reflected a limitation of the measure that may prevent its extension to vocoded speech whose distortions differ significantly from those of the reference speech.

This publication has 1 reference indexed in Scilit: