Heuristic informational analysis of sequences
- 10 January 1986
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 14 (1) , 179-196
- https://doi.org/10.1093/nar/14.1.179
Abstract
Nucleotide or amino-acid sequences are interpreted as successions of words of length k (k-tuples) the frequencies of which are highly variable in different statistical populations of genes or proteins. After building k-tuple reference tables from coherent subsets or entire data banks, the local information content profile of individual sequences is drawn. Anomalous regions (peaks or depressions) of such a profile can lead to the discovery and identification of specific sequence patterns. Along the same principle, the simultaneous use of two reference statistical populations and the computation of an index combining the two information profiles lead to a general and powerful discriminant analysis methods. The identification of a “signal” associated with gene conversion, the introns/exons discrimination and the location of function specific patterns in proteins are given as examples of successful applications of this heuristic informational approach.Keywords
This publication has 19 references indexed in Scilit:
- Computer generation and statistical analysis of a data bank of protein sequences translated from GenBankBiochimie, 1985
- Rapid and Sensitive Protein Similarity SearchesScience, 1985
- Assessing the biological significance of primary structure consensus patterns using sequence databanks. I. Heat–shock and glucocorticoid control elements in eukaryotic promotersBioinformatics, 1985
- Analysis of Biological Sequences on Small ComputersDNA, 1984
- Fast computer search for similar DNA sequencesNucleic Acids Research, 1984
- New approaches for computer analysis of nucleic acid sequences.Proceedings of the National Academy of Sciences, 1983
- Protein and Nucleic Acid Sequence Database SystemsAnnual Review of Biophysics and Bioengineering, 1983
- Genetic exchanges between partially homologous nucleotide sequences: possible implications for multigene familiesBiochimie, 1983
- Soybean leghemoglobin gene family: normal, pseudo, and truncated genes.Proceedings of the National Academy of Sciences, 1982
- DNA methylation and the frequency of CpG in animal DNANucleic Acids Research, 1980