Repeats and correlations in human DNA sequences
- 26 June 2003
- journal article
- research article
- Published by American Physical Society (APS) in Physical Review E
- Vol. 67 (6) , 061913
- https://doi.org/10.1103/physreve.67.061913
Abstract
We study the nucleotide-nucleotide mutual information function of the DNA sequences of the three completely sequenced human chromosomes 20, 21, and 22. We find in each human chromosome (i) the absence of the base pair (bp) sequence periodicity characteristic for protein coding regions, (ii) the absence of the sequence periodicity characteristic for both protein secondary structure and DNA bendability, and (iii) the presence of significant statistical dependencies at about and at about We investigate to which degree the density and composition of interspersed repeats might explain these observed statistical patterns in all three human chromosomes. We use simple stochastic models to substitute known interspersed repeats and find by numerical studies that (iv) the presence of interspersed repeats dominates short-range correlations as measured by on the scale of several hundred base pairs in human chromosomes 20, 21, and 22. On the other hand, we find that (v) interspersed repeats contribute only weakly to long-range correlations due to the clustering of highly abundant Alu repeats.
Keywords
This publication has 60 references indexed in Scilit:
- Human genome sequence variation and the influence of gene history, mutation and recombinationNature Genetics, 2002
- Optimization of Coding Potentials Using Positional Dependence of Nucleotide FrequenciesJournal of Theoretical Biology, 2000
- Species independence of mutual information in coding and noncoding DNAPhysical Review E, 2000
- 10-11 bp periodicities in complete genomes reflect protein structure and DNA folding.Bioinformatics, 1999
- Correlations in Protein Sequences and Property CodesJournal of Theoretical Biology, 1998
- Prediction of Function in DNA Sequence AnalysisJournal of Computational Biology, 1995
- Assessment of protein coding measuresNucleic Acids Research, 1992
- Recognition of protein coding regions in DNA sequencesNucleic Acids Research, 1982
- Codon preference and its use in identifying protein coding regions in long DNA sequencesNucleic Acids Research, 1982
- The pitch of chromatin DNA is reflected in its nucleotide sequence.Proceedings of the National Academy of Sciences, 1980