Robust Relational Parsing over Biomedical Literature: Extracting Inhibit Relation

Abstract
We describe the design of a robust parser for identifying and extracting biomolecular relations from the biomedical literature. Separate automata over distinct syntactic domains were developed for extraction of nominal-based relational information versus verbal-based relations. This allowed us to optimize the grammars separately for each module, regardless of any specific relation resulting in significantly better performance. A unique feature of this system is the use of text-based anaphora resolution to enhance the results of argument binding in relational extraction. We demonstrate the performance of our system on inhibition-relations, and present our initial results measured against an annotated text used as a gold standard for evaluation purposes. The results represent a significant improvement over previously published results on extracting such relations from Medline: Precision was 90 %, Recall 57 %, and Partial Recall 22%. These results demonstrate the effectiveness of a corpus-based linguistic approach to information extraction over Medline.