Extracting speech features from human speech like noise

Abstract
Human speech-like noise (HSLN) is a kind of bubble noise generated by superimposing independent speech signals typically more than one thousand times. Since the basic feature of HSLN varies from that of overlapped speech to stationary noise, keeping long time spectra in the same shape, we investigate perceptual discrimination of speech from stationary noise and its acoustic correlates using HSLN of various numbers of superposition. First we confirm the perceptual score, i.e. how much the HSLN sounds like stationary noise, and that the number of superpositions of HSLN is proportional by subjective tests. Then, we show that the amplitude distribution of the difference signal of HSLN approaches the Gaussian distribution from the Gamma distribution as the number of superpositions increase. The other subjective test to perceive three HSLN of different dynamic characteristics clarifies that the temporal change of spectral envelope plays an important roll in discriminating speech from noise.

This publication has 2 references indexed in Scilit: