Automatic parsing of parental verbal input

1 February 2004

journal article
research article
Published by Springer Nature in Behavior Research Methods, Instruments & Computers

Vol. 36 (1) , 113-126
https://doi.org/10.3758/bf03195557

Abstract

To evaluate theoretical proposals regarding the course of child language acquisition, researchers often need to rely on the processing of large numbers of syntacticallyparsed utterances, both from children and from their parents. Because it is so difficult to do this by hand, there are currently no parsed corpora of child language input data. To automate this process, we developed a system that combined the MOR tagger, a rule-based parser, and statistical disambiguation techniques. The resultant system obtained nearly 80% correct parses for the sentences spoken to children. To achieve this level, we had to construct a particular processing sequence that minimizes problems caused by the coverage/ ambiguity tradeoff in parser design. These procedures are particularly appropriate for use with the CHILDES database, an international corpus of transcripts. The data and programs are now freely available over the Internet.

Keywords

This publication has 6 references indexed in Scilit:

Balancing Robustness and Efficiency in Unification-Augmented Context-Free Parsers for Large Practical Applications
Published by Springer Nature ,2001
Automatic disambiguation of morphosyntax in spoken language corpora
Behavior Research Methods, Instruments & Computers, 2000
The acquisition of word order by a computational learning system
Published by Association for Computational Linguistics (ACL) ,2000
Foundations of Computational Linguistics
Published by Springer Nature ,1999
Corpus Annotation
Published by Taylor & Francis ,1997
A First Language
Published by Harvard University Press ,1973