RelEx—Relation extraction using dependency parse trees
Top Cited Papers
Open Access
- 1 December 2006
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 23 (3) , 365-371
- https://doi.org/10.1093/bioinformatics/btl616
Abstract
Motivation: The discovery of regulatory pathways, signal cascades, metabolic processes or disease models requires knowledge on individual relations like e.g. physical or regulatory interactions between genes and proteins. Most interactions mentioned in the free text of biomedical publications are not yet contained in structured databases. Results: We developed RelEx, an approach for relation extraction from free text. It is based on natural language preprocessing producing dependency parse trees and applying a small number of simple rules to these trees. We applied RelEx on a comprehensive set of one million MEDLINE abstracts dealing with gene and protein relations and extracted ~150 000 relations with an estimated perfomance of both 80% precision and 80% recall. Availability: The used natural language preprocessing tools are free for use for academic research. Test sets and relation term lists are available from our website ( ). Contact:katrin.fundel@bio.ifi.lmu.deKeywords
This publication has 17 references indexed in Scilit:
- Web servicing the biological officeBioinformatics, 2005
- Expert knowledge without the expert: integrated analysis of gene expression and literature to derive active functional contextsBioinformatics, 2005
- Extraction of regulatory gene/protein networks from MedlineBioinformatics, 2005
- Literature mining and database annotation of protein phosphorylation using a rule-based systemBioinformatics, 2005
- BioIE: extracting informative sentences from the biomedical literatureBioinformatics, 2005
- Co-occurrence based meta-analysis of scientific texts: retrieving biological relationships between genesBioinformatics, 2005
- Discovering patterns to extract protein–protein interactions from full textsBioinformatics, 2004
- New methods for joint analysis of biological networks and expression dataBioinformatics, 2004
- MedPost: a part-of-speech tagger for bioMedical textBioinformatics, 2004
- Human protein reference database as a discovery resource for proteomicsNucleic Acids Research, 2004