The KA/KS Ratio Test for Assessing the Protein-Coding Potential of Genomic Regions: An Empirical and Simulation Study
Open Access
- 14 December 2001
- journal article
- research article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 12 (1) , 198-202
- https://doi.org/10.1101/gr.200901
Abstract
Comparative genomics is a simple, powerful way to increase the accuracy of gene prediction. In this study, we show the utility of a simple test for the identification of protein-coding exons using human/mouse sequence comparisons. The test takes advantage of the fact that in the vast majority of coding regions, synonymous substitutions (KS) occur much more frequently than nonsynonymous ones (KA) and uses theKA/KS ratio as the criterion. We show the following: (1) most of the human and mouse exons are sufficiently long and have a suitable degree of sequence divergence for the test to perform reliably; (2) the test is suited for the identification of long exons and single exon genes, which are difficult to predict by current methods; (3) the test has a false-negative rate, lower than most of current gene prediction methods and a false-positive rate lower than all current methods; (4) the test has been automated and can be used in combination with other existing gene-prediction methods.Keywords
This publication has 11 references indexed in Scilit:
- Evaluation of Gene-Finding Programs on Mammalian SequencesGenome Research, 2001
- The Sequence of the Human GenomeScience, 2001
- Statistical methods for detecting molecular adaptationPublished by Elsevier ,2000
- Active Conservation of Noncoding Sequences Revealed by Three-Way Species ComparisonsGenome Research, 2000
- Conservation, Regulation, Synteny, and Introns in a Large-scale C. briggsae–C. elegans Genomic AlignmentGenome Research, 2000
- Human and Mouse Gene Structure: Comparative Analysis and Application to Exon PredictionGenome Research, 2000
- Comparative Analysis of Noncoding Regions of 77 Orthologous Mouse and Human Gene PairsGenome Research, 1999
- Evolutionary parameters of the transcribed mammalian genome: An analysis of 2,820 orthologous rodent and human sequencesProceedings of the National Academy of Sciences, 1998
- CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choiceNucleic Acids Research, 1994
- A codon-based model of nucleotide substitution for protein-coding DNA sequences.Molecular Biology and Evolution, 1994