Vertebrate gene predictions and the problem of large genes
- 1 September 2003
- journal article
- review article
- Published by Springer Nature in Nature Reviews Genetics
- Vol. 4 (9) , 741-749
- https://doi.org/10.1038/nrg1160
Abstract
To find unknown protein-coding genes, annotation pipelines use a combination of ab initio gene prediction and similarity to experimentally confirmed genes or proteins. Here, we show that although the ab initio predictions have an intrinsically high false-positive rate, they also have a consistently low false-negative rate. The incorporation of similarity information is meant to reduce the false-positive rate, but in doing so it increases the false-negative rate. The crucial variable is gene size (including introns) — genes of the most extreme sizes, especially very large genes, are most likely to be incorrectly predicted.Keywords
This publication has 39 references indexed in Scilit:
- C. elegans ORFeome version 1.1: experimental verification of the genome annotation and resource for proteome-scale protein expressionNature Genetics, 2003
- Comparative Gene Prediction in Human and MouseGenome Research, 2003
- Leveraging the Mouse Genome for Gene Prediction in Human: From Whole-Genome Shotgun Reads to a Global Synteny MapGenome Research, 2003
- Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAsNature, 2002
- Initial sequencing and comparative analysis of the mouse genomeNature, 2002
- The Human Genome Browser at UCSCGenome Research, 2002
- BLAT—The BLAST-Like Alignment ToolGenome Research, 2002
- Evaluation of Gene-Finding Programs on Mammalian SequencesGenome Research, 2001
- Initial sequencing and analysis of the human genomeNature, 2001
- Prediction of complete gene structures in human genomic DNAJournal of Molecular Biology, 1997