Abstract
Principal component analysis (PCA) is a ubiquitous method of multivariate statistics that focuses on the eigenvalues lambda and eigenvectors of the sample covariance matrix of a data set. We consider p, N-dimensional data vectors xi drawn from a distribution with covariance matrix C. We use the replica method to evaluate the expected eigenvalue distribution rho(lambda) as N--> infinity with p=alphaN for some fixed alpha. In contrast to existing studies we consider the case where C contains a number of symmetry-breaking directions, so that the sample data set contains some definite structure. Explicitly we set C=sigma2I+sigma(2)Sigma(S)(m=1)A(m)B(m)B(T)(m), with A(m)>0 for all m. We find that the bulk of the eigenvalues are distributed as for the case when the elements of xi are independent and identically distributed. With increasing alpha a series of phase transitions are observed, at alpha=A(-2)(m), m=1,2,..., S, each time a single delta function, delta(lambda-lambda(u)(A(m))), separates from the upper edge of the bulk distribution, where lambda(u)(A)=sigma(2)[1+A][1+(alphaA)(-1)]. We confirm the results of the replica analysis by studying the Stieltjes transform of rho(lambda). This suggests that the results obtained from the replica analysis are universal, irrespective of the distribution from which xi is drawn, provided the fourth moment of each element of xi exists.

This publication has 33 references indexed in Scilit: