Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach.
- 15 December 1991
- journal article
- Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences
- Vol. 88 (24) , 11261-11265
- https://doi.org/10.1073/pnas.88.24.11261
Abstract
Genes in higher eukaryotes may span tens or hundreds of kilobases with the protein-coding regions accounting for only a few percent of the total sequence. Identifying genes within large regions of uncharacterized DNA is a difficult undertaking and is currently the focus of many research efforts. We describe a reliable computational approach for locating protein-coding portions of genes in anonymous DNA sequence. Using a concept suggested by robotic environmental sensing, our method combines a set of sensor algorithms and a neural network to localize the coding regions. Several algorithms that report local characteristics of the DNA sequence, and therefore act as sensors, are also described. In its current configuration the "coding recognition module" identifies 90% of coding exons of length 100 bases or greater with less than one false positive coding exon indicated per five coding exons indicated. This is a significantly lower false positive rate than any method of which we are aware. This module demonstrates a method with general applicability to sequence-pattern recognition problems and is available for current research efforts.Keywords
This publication has 17 references indexed in Scilit:
- Fractal geometry of music.Proceedings of the National Academy of Sciences, 1990
- [15] k-tuple frequency analysis: From intron/exon discrimination to T-cell epitope mappingPublished by Elsevier ,1990
- A Common Language for Physical Mapping of the Human GenomeScience, 1989
- The GenBank®genetic sequence data bankNucleic Acids Research, 1988
- Learning algorithms and probability distributions in feed-forward and feed-back networksProceedings of the National Academy of Sciences, 1987
- A comprehensive set of sequence analysis programs for the VAXNucleic Acids Research, 1984
- A method for measuring the non-random bias of a codon usage tableNucleic Acids Research, 1984
- Recognition of protein coding regions in DNA sequencesNucleic Acids Research, 1982
- Neural networks and physical systems with emergent collective computational abilities.Proceedings of the National Academy of Sciences, 1982
- Codon preference and its use in identifying protein coding regions in long DNA sequencesNucleic Acids Research, 1982