Segmental prototype interpolation coding

1 January 1999

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

Vol. 4 (15206149) , 2311-2314 vol.4
https://doi.org/10.1109/icassp.1999.758400

Abstract

Current parametric speech coding schemes can achieve high communications quality speech at bit rates in the range of 2.4 to 1.5 kbits/sec. Most schemes sample and quantise, at regular intervals, the "tracks in time" generated by the parameters of the speech production model. As a result, reconstructed "parameter tracks" do not evolve "smoothly" with time. Furthermore, no advantage is taken of the "linguistic event" nature of speech. In this paper, model parameter "time tracks" are split into non-overlapping speech "event" related segments. These segment based evolutions of model parameters are then vector quantised to provide at the receiver a smooth and subjectively meaningful reconstruction. Thus the paper presents an application of this generic segmental speech model quantisation approach to a 1.5 kbits/sec prototype interpolation coding (PIC) system. Results indicate that the proposed methodology can almost halve the bit rate of this PIC system while preserving overall recovered speech quality.

Keywords

This publication has 6 references indexed in Scilit:

Source driven variable bit rate prototype interpolation coding
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
A very low bit rate speech coder using HMM-based speech recognition/synthesis techniques
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Segmental vocoder-going beyond the phonetic approach
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Efficient coding of LSP parameters using split matrix quantisation
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Multiband excitation vocoder
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1988
Speech analysis/Synthesis based on a sinusoidal representation
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1986