High-dimensional classification using features annealed independence rules

Top Cited Papers

Open Access

1 December 2008

journal article
Published by Institute of Mathematical Statistics in The Annals of Statistics

Vol. 36 (6) , 2605-2637
https://doi.org/10.1214/07-aos504

Abstract

Classification using high-dimensional features arises frequently in many contemporary statistical studies such as tumor classification using microarray or other high-throughput data. The impact of dimensionality on classifications is poorly understood. In a seminal paper, Bickel and Levina [Bernoulli 10 (2004) 989–1010] show that the Fisher discriminant performs poorly due to diverging spectra and they propose to use the independence rule to overcome the problem. We first demonstrate that even for the independence classification rule, classification using all the features can be as poor as the random guessing due to noise accumulation in estimating population centroids in high-dimensional feature space. In fact, we demonstrate further that almost all linear discriminants can perform as poorly as the random guessing. Thus, it is important to select a subset of important features for high-dimensional classification, resulting in Features Annealed Independence Rules (FAIR). The conditions under which all the important features can be selected by the two-sample t-statistic are established. The choice of the optimal number of features, or equivalently, the threshold value of the test statistics are proposed based on an upper bound of the classification error. Simulation studies and real data analysis support our theoretical results and demonstrate convincingly the advantage of our new classification procedure.

Keywords

All Related Versions

This publication has 21 references indexed in Scilit:

Relaxed Lasso
Computational Statistics & Data Analysis, 2007
Moderate deviations for two sample t-statistics
ESAIM: Probability and Statistics, 2007
Best subset selection, persistence in high-dimensional statistical learning and optimization under l1 constraint
The Annals of Statistics, 2006
Prediction by Supervised Principal Components
Journal of the American Statistical Association, 2006
Some theory for Fisher's linear discriminant function, `naive Bayes', and some alternatives when there are many more variables than observations
Bernoulli, 2004
Persistence in high-dimensional linear predictor selection and the virtue of overparametrization
Bernoulli, 2004
PLS Dimension Reduction for Classification with Microarray Data
Statistical Applications in Genetics and Molecular Biology, 2004
Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data
Journal of the American Statistical Association, 2002
Test of Significance Based on Wavelet Thresholding and Neyman's Truncation
Journal of the American Statistical Association, 1996
Regularized Discriminant Analysis
Journal of the American Statistical Association, 1989