Transposable regularized covariance models with an application to missing data imputation
Open Access
- 1 June 2010
- journal article
- Published by Institute of Mathematical Statistics in The Annals of Applied Statistics
- Vol. 4 (2) , 764-790
- https://doi.org/10.1214/09-aoas314
Abstract
Missing data estimation is an important challenge with high-dimensional data arranged in the form of a matrix. Typically this data matrix is transposable, meaning that either the rows, columns or both can be treated as features. To model transposable data, we present a modification of the matrix-variate normal, the mean-restricted matrix-variate normal, in which the rows and columns each have a separate mean vector and covariance matrix. By placing additive penalties on the inverse covariance matrices of the rows and columns, these so-called transposable regularized covariance models allow for maximum likelihood estimation of the mean and nonsingular covariance matrices. Using these models, we formulate EM-type algorithms for missing data imputation in both the multivariate and transposable frameworks. We present theoretical results exploiting the structure of our transposable models that allow these models and imputation methods to be applied to high-dimensional data. Simulations and results on microarray data and the Netflix data show that these imputation techniques often outperform existing methods and offer a greater degree of flexibility.Keywords
All Related Versions
This publication has 23 references indexed in Scilit:
- Correlated z-Values and the Accuracy of Large-Scale Statistical EstimatesJournal of the American Statistical Association, 2010
- Transposable regularized covariance models with an application to missing data imputationThe Annals of Applied Statistics, 2010
- Are a set of microarrays independent of each other?The Annals of Applied Statistics, 2009
- Exact Matrix Completion via Convex OptimizationFoundations of Computational Mathematics, 2009
- Covariance-Regularized Regression and Classification for high Dimensional ProblemsJournal of the Royal Statistical Society Series B: Statistical Methodology, 2009
- Sparse inverse covariance estimation with the graphical lassoBiostatistics, 2007
- Gene Expression Profiling Predicts Survival in Conventional Renal Cell CarcinomaPLoS Medicine, 2005
- The mle algorithm for the matrix normal distributionJournal of Statistical Computation and Simulation, 1999
- Stochastic versions of the em algorithm: an experimental study in the mixture caseJournal of Statistical Computation and Simulation, 1996
- Multiple Imputation after 18+ YearsJournal of the American Statistical Association, 1996