Combination of text-mining algorithms increases the performance
Open Access
- 9 June 2006
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 22 (17) , 2151-2157
- https://doi.org/10.1093/bioinformatics/btl281
Abstract
Motivation: Recently, several information extraction systems have been developed to retrieve relevant information out of biomedical text. However, these methods represent individual efforts. In this paper, we show that by combining different algorithms and their outcome, the results improve significantly. For this reason, CONAN has been created, a system which combines different programs and their outcome. Its methods include tagging of gene/protein names, finding interaction and mutation data, tagging of biological concepts and linking to MeSH and Gene Ontology terms. Results: In this paper, we will present data that show that combining different text-mining algorithms significantly improves the results. Not only is CONAN a full-scale approach that will ultimately cover all of PubMed/MEDLINE, we also show that this universality has no effect on quality: our system performs as well as or better than existing systems. Availability: The LDD corpus presented is available by request to the author. The system will be available shortly. For information and updates on CONAN please visit Contact:rainer@cs.uu.nl Supplementary information: Supplementary data are available at Bioinformatics online.Keywords
This publication has 27 references indexed in Scilit:
- Implementing the iHOP concept for navigation of biomedical literatureBioinformatics, 2005
- Facts from Text—Is Text Mining Ready to Deliver?PLoS Biology, 2005
- iProLINK: an integrated protein resource for literature miningComputational Biology and Chemistry, 2004
- Textpresso: An Ontology-Based Information Retrieval and Extraction System for Biological LiteraturePLoS Biology, 2004
- Protein names precisely peeled off free textBioinformatics, 2004
- An Overview of EnsemblGenome Research, 2004
- UniProt: the Universal Protein knowledgebaseNucleic Acids Research, 2004
- The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003Nucleic Acids Research, 2003
- BIND: the Biomolecular Interaction Network DatabaseNucleic Acids Research, 2003
- Basic local alignment search toolJournal of Molecular Biology, 1990