A multi-level text mining method to extract biological relationships
- 1 January 2002
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- Vol. 21, 97-108
- https://doi.org/10.1109/csb.2002.1039333
Abstract
Accurate and computationally efficient approaches in discovering relationships between biological objects from text documents are important for biologists to develop biological models. This paper presents a novel approach to extract relationships between multiple biological objects that are present in a text document. The approach involves object identification, reference resolution, ontology and synonym discovery, and extracting object-object relationships. Hidden Markov models (HMMs), dictionaries, and N-Gram models are used to set the framework to tackle the complex task of extracting object-object relationships. Experiments were carried out using a corpus of one thousand Medline abstracts. Intermediate results were obtained for the object identification process, synonym discovery, and finally the relationship extraction. For a corpus of thousand abstracts, 53 relationships were extracted of which 43 were correct, giving a specificity of 81%. The approach is both adaptable and scalable to new problems as opposed to rule-based methods.Keywords
This publication has 9 references indexed in Scilit:
- GENIES: a natural-language processing system for the extraction of molecular pathways from journal articlesBioinformatics, 2001
- Disambiguating proteins, genes, and RNA in text: a machine learning approachBioinformatics, 2001
- The Sequence of the Human GenomeScience, 2001
- PNAD-CSS: a workbench for constructing a protein name abbreviation dictionaryBioinformatics, 2000
- Extracting the names of genes and gene products with a hidden Markov modelPublished by Association for Computational Linguistics (ACL) ,2000
- WordNetPublished by MIT Press ,1998
- Biological Sequence AnalysisPublished by Cambridge University Press (CUP) ,1998
- Basic Local Alignment Search ToolJournal of Molecular Biology, 1990