Molecular sequence accuracy and the analysis of protein coding regions.
- 1 July 1991
- journal article
- research article
- Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences
- Vol. 88 (13) , 5518-5522
- https://doi.org/10.1073/pnas.88.13.5518
Abstract
Molecular sequences, like all experimental data, have finite error rates. The impact of errors on the information content of molecular sequence data is dependent on the analytic paradigm used to interpret the data. We studied the impact of nucleic acid sequence errors on the ability to align predicted amino acid sequences with the sequences of related proteins. We found that with a simultaneous translation and alignment algorithm, identification of sequence homologies is resilient to the introduction of random errors. Proteins with greater than 30% sequence identity can be reliably recognized even in the presence of 1% frameshifting (insertion or deletion) error rates and 5% base substitution rates. Incorporation of prior knowledge about the location and characteristics of errors improves tolerance to error of amino acid sequence alignments. Similarly, inclusion of prior knowledge of biased codon utilization by yeast (Saccharomyces cerevisiae) allows reliable detection of correct reading frames in yeast sequences even in the presence of 5% substitution and 1% frameshift errors.Keywords
This publication has 14 references indexed in Scilit:
- Cloning of the proteinase that facilitates infection by schistosome parasites.Published by Elsevier ,2021
- Identification of common molecular subsequencesPublished by Elsevier ,2004
- [57] Sequencing end-labeled DNA with base-specific chemical cleavagesPublished by Elsevier ,2004
- DNA sequence analysis with a modified bacteriophage T7 DNA polymerase. Effect of pyrophosphorolysis and metal ions.Journal of Biological Chemistry, 1990
- Sequencing of megabase plus DNA by hybridization: Theory of the methodGenomics, 1989
- Improved tools for biological sequence comparison.Proceedings of the National Academy of Sciences, 1988
- [47] Establishing homologies in protein sequencesPublished by Elsevier ,1983
- Recognition of protein coding regions in DNA sequencesNucleic Acids Research, 1982
- Correlation between the abundance of yeast transfer RNAs and the occurrence of the respective codons in protein genesJournal of Molecular Biology, 1982
- Codon catalog usage is a genome strategy modulated for gene expressivityNucleic Acids Research, 1981