High-dimensional classification using features annealed independence rules
Top Cited Papers
Open Access
- 1 December 2008
- journal article
- Published by Institute of Mathematical Statistics in The Annals of Statistics
- Vol. 36 (6) , 2605-2637
- https://doi.org/10.1214/07-aos504
Abstract
Classification using high-dimensional features arises frequently in many contemporary statistical studies such as tumor classification using microarray or other high-throughput data. The impact of dimensionality on classifications is poorly understood. In a seminal paper, Bickel and Levina [Bernoulli 10 (2004) 989–1010] show that the Fisher discriminant performs poorly due to diverging spectra and they propose to use the independence rule to overcome the problem. We first demonstrate that even for the independence classification rule, classification using all the features can be as poor as the random guessing due to noise accumulation in estimating population centroids in high-dimensional feature space. In fact, we demonstrate further that almost all linear discriminants can perform as poorly as the random guessing. Thus, it is important to select a subset of important features for high-dimensional classification, resulting in Features Annealed Independence Rules (FAIR). The conditions under which all the important features can be selected by the two-sample t-statistic are established. The choice of the optimal number of features, or equivalently, the threshold value of the test statistics are proposed based on an upper bound of the classification error. Simulation studies and real data analysis support our theoretical results and demonstrate convincingly the advantage of our new classification procedure.Keywords
All Related Versions
This publication has 21 references indexed in Scilit:
- Relaxed LassoComputational Statistics & Data Analysis, 2007
- Moderate deviations for two sample t-statisticsESAIM: Probability and Statistics, 2007
- Best subset selection, persistence in high-dimensional statistical learning and optimization under l1 constraintThe Annals of Statistics, 2006
- Prediction by Supervised Principal ComponentsJournal of the American Statistical Association, 2006
- Some theory for Fisher's linear discriminant function, `naive Bayes', and some alternatives when there are many more variables than observationsBernoulli, 2004
- Persistence in high-dimensional linear predictor selection and the virtue of overparametrizationBernoulli, 2004
- PLS Dimension Reduction for Classification with Microarray DataStatistical Applications in Genetics and Molecular Biology, 2004
- Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression DataJournal of the American Statistical Association, 2002
- Test of Significance Based on Wavelet Thresholding and Neyman's TruncationJournal of the American Statistical Association, 1996
- Regularized Discriminant AnalysisJournal of the American Statistical Association, 1989