Segmental vocoder-going beyond the phonetic approach

Abstract
The problem of very low bit rate segmental speech coding is addressed. The basic units are found automatically in the training database using temporal decomposition, vector quantization and multigrams. They are modelled by HMMs. The coding is based on recognition and synthesis. In single speaker tests, we obtained intelligible and naturally sounding speech at a mean rate of 211.2 b/s. In the end, future extensions of our scheme (diphone-like synthesis and speaker adaptation) as well as possible use of automatically derived units in recognition are discussed.

This publication has 5 references indexed in Scilit: