High-dimensional analysis of semidefinite relaxations for sparse principal components
Open Access
- 1 October 2009
- journal article
- Published by Institute of Mathematical Statistics in The Annals of Statistics
- Vol. 37 (5B)
- https://doi.org/10.1214/08-aos664
Abstract
Principal component analysis (PCA) is a classical method for dimensionality reduction based on extracting the dominant eigenvectors of the sample covariance matrix. However, PCA is well known to behave poorly in the ``large $p$, small $n$'' setting, in which the problem dimension $p$ is comparable to or larger than the sample size $n$. This paper studies PCA in this high-dimensional regime, but under the additional assumption that the maximal eigenvector is sparse, say, with at most $k$ nonzero components. We consider a spiked covariance model in which a base matrix is perturbed by adding a $k$-sparse maximal eigenvector, and we analyze two computationally tractable methods for recovering the support set of this maximal eigenvector, as follows: (a) a simple diagonal thresholding method, which transitions from success to failure as a function of the rescaled sample size $\theta_{\mathrm{dia}}(n,p,k)=n/[k^2\log(p-k)]$; and (b) a more sophisticated semidefinite programming (SDP) relaxation, which succeeds once the rescaled sample size $\theta_{\mathrm{sdp}}(n,p,k)=n/[k\log(p-k)]$ is larger than a critical threshold. In addition, we prove that no method, including the best method which has exponential-time complexity, can succeed in recovering the support if the order parameter $\theta_{\mathrm{sdp}}(n,p,k)$ is below a threshold. Our results thus highlight an interesting trade-off between computational and statistical efficiency in high-dimensional inference.Comment: Published in at http://dx.doi.org/10.1214/08-AOS664 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org
Keywords
All Related Versions
This publication has 24 references indexed in Scilit:
- Regularized estimation of large covariance matricesThe Annals of Statistics, 2008
- The Dantzig selector: Statistical estimation when p is much larger than nThe Annals of Statistics, 2007
- High-dimensional graphs and variable selection with the LassoThe Annals of Statistics, 2006
- Sparse Principal Component AnalysisJournal of Computational and Graphical Statistics, 2006
- On the distribution of the largest eigenvalue in principal components analysisThe Annals of Statistics, 2001
- Chi-square oracle inequalitiesPublished by Institute of Mathematical Statistics ,2001
- An alternative point of view on Lepski's methodPublished by Institute of Mathematical Statistics ,2001
- Adaptive estimation of a quadratic functional by model selectionThe Annals of Statistics, 2000
- Information-theoretic determination of minimax rates of convergenceThe Annals of Statistics, 1999
- A Limit Theorem for the Norm of Random MatricesThe Annals of Probability, 1980