FASTUS: A System for Extracting Information from Natural-Language Text

Abstract

FASTUS is a system for extracting information from free text in English, and potentially other languages as well, for entry into a database, and potentially for other applications. It works essentially as a cascaded, nondeterministic finite state automaton. There are four steps in the operation of FASTUS. In Step (1) sentences are scanned for certain trigger words to determine whether further processing should be done. In Step (2) noun groups, verb groups, and prepositions and some other particles are recognized. The input to Step (3) is the sequence of phrases recognized in Step (2); patterns of interest are identified in Step (3) and corresponding incident structures are built up. In Step (4) incident structures that derive from the same incident are identified and merged, and these are used in generating database entries. FASTUS is an order of magnitude faster than any comparable system; it can process a news report in an average of less than eleven seconds. This translates directly into fast development time. In the three and a half weeks between its first use and the MUC-4 evaluation in May 1992, we were able to build up its domain knowledge to a point where it was among the leaders in the evaluation.

Keywords

This publication has 0 references indexed in Scilit: