Abstract
Several techniques for speech coding at rates of 4 kb/s and lower require quantization of spectral magnitudes at a set of frequencies which are harmonics of the fundamental pitch period of the talker (for example: multiband excitation coding, sinusoidal transform coding, and time-frequency interpolation). The number of harmonic magnitudes to be quantized depends on the fundamental frequency value and hence is variable, changing from frame to frame. The variable number of components to be quantized makes it difficult to use fixed-dimension vector quantization for harmonic magnitude encoding. In this paper, we introduce a quantization technique called non-square transform vector quantization (NSTVQ) which uses a fixed-dimension vector quantizer combined with a variable-size non-square transform which maps the variable-dimension harmonic magnitude vector into a fixed-dimension vector. The optimal reconstruction procedure for non-square transforms is derived and shown to be equivalent to an optimal least-square estimation procedure. The proposed technique is evaluated experimentally as part of a new coding system called spectral excitation coding (SEC). The results are compared to an existing technique which estimates the spectral shape using all-pole modeling followed by vector quantization of the LSP parameters.

This publication has 7 references indexed in Scilit: