Genomix: a method for combining gene-finders' predictions, which uses evolutionary conservation of sequence and intron–exon structure
Open Access
- 5 May 2007
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 23 (12) , 1468-1475
- https://doi.org/10.1093/bioinformatics/btm133
Abstract
Motivation: Correct gene predictions are crucial for most analyses of genomes. However, in the absence of transcript data, gene prediction is still challenging. One way to improve gene-finding accuracy in such genomes is to combine the exons predicted by several gene-finders, so that gene-finders that make uncorrelated errors can correct each other. Results: We present a method for combining gene-finders called Genomix. Genomix selects the predicted exons that are best conserved within and/or between species in terms of sequence and intron–exon structure, and combines them into a gene structure. Genomix was used to combine predictions from four gene-finders for Caenorhabditis elegans, by selecting the predicted exons that are best conserved with C.briggsae and C.remanei. On a set of ∼1500 confirmed C.elegans genes, Genomix increased the exon-level specificity by 10.1% and sensitivity by 2.7% compared to the best input gene-finder. Availability: Scripts and Supplementary Material can be found at http://www.sanger.ac.uk/Software/analysis/genomix Contact:alc@sanger.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.Keywords
This publication has 32 references indexed in Scilit:
- Genome annotation past, present, and future: How to define an ORF at each locusGenome Research, 2005
- The Genome Sequence of Caenorhabditis briggsae: A Platform for Comparative GenomicsPLoS Biology, 2003
- Human–Mouse Gene Identification by Comparative Evidence Integration and Evolutionary AnalysisGenome Research, 2003
- Comparative genomics: genome-wide analysis in metazoan eukaryotesNature Reviews Genetics, 2003
- Comparative Gene Prediction in Human and MouseGenome Research, 2003
- GAZE: A Generic Framework for the Integration of Gene-Prediction Data by Dynamic ProgrammingGenome Research, 2002
- Evaluation of Gene-Finding Programs on Mammalian SequencesGenome Research, 2001
- Gene Structure Prediction and Alternative Splicing Analysis Using Genomically Aligned ESTsGenome Research, 2001
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Evaluation of Gene Structure Prediction ProgramsGenomics, 1996