Toward a model for lexical access based on acoustic landmarks and distinctive features

Top Cited Papers

1 April 2002

journal article
research article
Published by Acoustical Society of America (ASA) in The Journal of the Acoustical Society of America

Vol. 111 (4) , 1872-1891
https://doi.org/10.1121/1.1458026

Abstract

This article describes a model in which the acoustic speech signal is processed to yield a discrete representation of the speech stream in terms of a sequence of segments, each of which is described by a set (or bundle) of binary distinctive features. These distinctive features specify the phonemic contrasts that are used in the language, such that a change in the value of a feature can potentially generate a new word. This model is a part of a more general model that derives a word sequence from this feature representation, the words being represented in a lexicon by sequences of feature bundles. The processing of the signal proceeds in three steps: (1) Detection of peaks, valleys, and discontinuities in particular frequency ranges of the signal leads to identification of acoustic landmarks. The type of landmark provides evidence for a subset of distinctive features called articulator-free features (e.g., [vowel], [consonant], [continuant]). (2) Acoustic parameters are derived from the signal near the landmarks to provide evidence for the actions of particular articulators, and acoustic cues are extracted by sampling selected attributes of these parameters in these regions. The selection of cues that are extracted depends on the type of landmark and on the environment in which it occurs. (3) The cues obtained in step (2) are combined, taking context into account, to provide estimates of “articulator-bound” features associated with each landmark (e.g., [lips], [high], [nasal]). These articulator-bound features, combined with the articulator-free features in (1), constitute the sequence of feature bundles that forms the output of the model. Examples of cues that are used, and justification for this selection, are given, as well as examples of the process of inferring the underlying features for a segment when there is variability in the signal due to enhancement gestures (recruited by a speaker to make a contrast more salient) or due to overlap of gestures from neighboring segments.

Keywords

This publication has 34 references indexed in Scilit:

Glottal characteristics of female speakers: Acoustic correlates
The Journal of the Acoustical Society of America, 1997
The Role of Phonetics within the Study of Language
Phonetica, 1991
Analysis, synthesis, and perception of voice quality variations among female and male talkers
The Journal of the Acoustical Society of America, 1990
On subglottal formant analysis
The Journal of the Acoustical Society of America, 1987
A perceptual model of vowel recognition based on the auditory representation of American English vowels
The Journal of the Acoustical Society of America, 1986
Speech coding in the auditory nerve: IV. Sounds with consonant-like dynamic characteristics
The Journal of the Acoustical Society of America, 1984
Effect of burst amplitude on the perception of stop consonant place of articulation
The Journal of the Acoustical Society of America, 1983
Automatic segmentation of speech into syllabic units
The Journal of the Acoustical Society of America, 1975
Nasalization of Vowels in Relation to Nasals
The Journal of the Acoustical Society of America, 1958
The Influence of Consonant Environment upon the Secondary Acoustical Characteristics of Vowels
The Journal of the Acoustical Society of America, 1953