Principal Component Analysis and Large-Scale Correlations in Non-Coding Sequences of Human DNA

Abstract
We have calculated a full set of second-order correlation functions of nucleotides in noncoding DNA. They are found to be independently invariant in regard to permutations of A and T, and also C and G. Considering correlation functions as a 4 × 4 matrix with a symmetrical basis, we have found the principal components—objects with zero cross-correlations. These three principal components are present the base compositions: (A + TCG), (AT), (CG). The long-range behavior of these principal components yields power-law dependencies with different critical exponents. Key words: long-range correlations, DNA, principal component analysis