Robust Principal Components

Abstract
This paper proposes a new algorithm to obtain an eigenvalue decomposition for the sample covariance matrix of a multivariate dataset. The algorithm is based on the rotation technique employed by Ammann and Van Ness (1988a,b) to obtain a robust solution to an errors-in-variables problem. When this rotation technique is combined with an iterative reweighting of the data, a robust eigenvalue decomposition is obtained. This robust eigenvalue decomposition has important applications to principal component analysis. Monte Carlo simulations are performed to compare ordinary principal component analysis using the standard eigenvalue decomposition with this algorithm, referred to as ROPRC. It is seen that ROPRC is reasonably efficient compared to an eigenvalue decomposition when Gaussian data is available, and that ROPRC is much better than the eigenvalue decomposition if outliers are present or if the data has a heavy-tailed distribution. The algorithm returns useful numerical diagnostic information in the form of a matrix of weights that describes the importance of each observation in the determination of each of the principal components. These weights are used to obtain robust estimates of the eigenvalues and the underlying covariance structure of the data. An example is given to illustrate the use of ROPRC and to compare its results with standard principal component analysis.