The paper describes the development of software for automatic grammatical analysis of unrestricted, unedited English text at the Unit for Computer Research on the English Language (UCREL) at the University of Lancaster. The work is currently funded by IBM and carried out in collaboration with colleagues at IBM UK (Winchester) and IBM Yorktown Heights. The paper will focus on the lexicon component of the word tagging system, the UCREL grammar, the databanks of parsed sentences, and the tools that have been written to support development of these components. This work has applications to speech technology, spelling correction, and other areas of natural language processing. Currently, our goal is to provide a language model using transition statistics to disambiguate alternative parses for a speech recognition device.

This publication has 0 references indexed in Scilit: