Sequence Compositional Complexity of DNA through an Entropic Segmentation Method

Abstract
A new complexity measure, based on the entropic segmentation of DNA sequences into compositionally homogeneous domains, is proposed. Sequence compositional complexity (SCC) deals directly with the complex heterogeneity in nonstationary DNA sequences. The plot of SCC as a function of significance level provides a profile of sequence structure at different length scales. SCC is found to be higher in sequences with long-range correlation than those without, and higher in noncoding sequences than coding sequences. Furthermore, a general agreement is found between the SCC of the DNA sequence, on one hand, and the biological complexity of the organism, on the other, attributable to an increasingly complex organization of noncoding DNA over the course of evolution.