Statistical method for predicting protein coding regions in nucleic acid sequences
- 1 November 1987
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 3 (4) , 287-295
- https://doi.org/10.1093/bioinformatics/3.4.287
Abstract
Protein coding regions of a genome fragment can be mathematically predicted by studying variations in the statistical properties or by searching the signals characteristic of the junctions between the coding and non-coding regions. We propose here a new statistical method using correspondence analysis. This method does not use any reference codon set but takes into account the codon usage homogeneity along the studied genome fragment. Comparison with previously published methods especially the ‘codon usage method’ of Staden has been made, and two examples are presented here. Applications to analysis of prokaryotic operon and eukaryotic split genes are also discussed. Use of the method has also shown two structures not previously described: i) in the human prt gene, a strong triplet structure exists in a non-coding region; ii) in the human tp-a codon usage is not uniform between the different exonsThis publication has 6 references indexed in Scilit:
- Nucleotide distribution and the recognition of coding regions in DNA sequences: An information theory approachJournal of Theoretical Biology, 1985
- Application of learning techniques to splicing site recognitionBiochimie, 1985
- Conservation of RNA secondary structures in two intron families including mitochondrial-, chloroplast- and nuclear-encoded members.The EMBO Journal, 1983
- Nucleotide sequence of bacteriophage λ DNAJournal of Molecular Biology, 1982
- Single base substitution in an intron of oxidase gene compensates splicing defects of the cytochrome b geneNature, 1982
- Method to determine the reading frame of a protein from the purine/pyrimidine genome sequence and its possible evolutionary justification.Proceedings of the National Academy of Sciences, 1981