Principal-component-analysis eigenvalue spectra from data with symmetry-breaking structure
- 27 February 2004
- journal article
- Published by American Physical Society (APS) in Physical Review E
- Vol. 69 (2) , 026124
- https://doi.org/10.1103/physreve.69.026124
Abstract
Principal component analysis (PCA) is a ubiquitous method of multivariate statistics that focuses on the eigenvalues lambda and eigenvectors of the sample covariance matrix of a data set. We consider p, N-dimensional data vectors xi drawn from a distribution with covariance matrix C. We use the replica method to evaluate the expected eigenvalue distribution rho(lambda) as N--> infinity with p=alphaN for some fixed alpha. In contrast to existing studies we consider the case where C contains a number of symmetry-breaking directions, so that the sample data set contains some definite structure. Explicitly we set C=sigma2I+sigma(2)Sigma(S)(m=1)A(m)B(m)B(T)(m), with A(m)>0 for all m. We find that the bulk of the eigenvalues are distributed as for the case when the elements of xi are independent and identically distributed. With increasing alpha a series of phase transitions are observed, at alpha=A(-2)(m), m=1,2,..., S, each time a single delta function, delta(lambda-lambda(u)(A(m))), separates from the upper edge of the bulk distribution, where lambda(u)(A)=sigma(2)[1+A][1+(alphaA)(-1)]. We confirm the results of the replica analysis by studying the Stieltjes transform of rho(lambda). This suggests that the results obtained from the replica analysis are universal, irrespective of the distribution from which xi is drawn, provided the fourth moment of each element of xi exists.Keywords
This publication has 33 references indexed in Scilit:
- Developments in random matrix theoryJournal of Physics A: General Physics, 2003
- Wishart and anti-Wishart random matricesJournal of Physics A: General Physics, 2003
- A Note on Universality of the Distribution of the Largest Eigenvalues in Certain Sample Covariance MatricesJournal of Statistical Physics, 2002
- On the distribution of the largest eigenvalue in principal components analysisThe Annals of Statistics, 2001
- Probabilistic Principal Component AnalysisJournal of the Royal Statistical Society Series B: Statistical Methodology, 1999
- Mixtures of Probabilistic Principal Component AnalyzersNeural Computation, 1999
- Eigenvalues and Condition Numbers of Random MatricesSIAM Journal on Matrix Analysis and Applications, 1988
- DISTRIBUTION OF EIGENVALUES FOR SOME SETS OF RANDOM MATRICESMathematics of the USSR-Sbornik, 1967
- Asymptotic Theory for Principal Component AnalysisThe Annals of Mathematical Statistics, 1963
- THE GENERALISED PRODUCT MOMENT DISTRIBUTION IN SAMPLES FROM A NORMAL MULTIVARIATE POPULATIONBiometrika, 1928