Methods of variable selection in regression modeling
- 1 January 1998
- journal article
- research article
- Published by Taylor & Francis in Communications in Statistics - Simulation and Computation
- Vol. 27 (3) , 711-734
- https://doi.org/10.1080/03610919808813505
Abstract
Simulation was used to evaluate the performances of several methods of variable selection in regression modeling: stepwise regression based on partial F-tests, stepwise minimization of Mallows’ C p statistic and Schwarz’s Bayes Information Criterion (BIC), and regression trees constructed with two kinds of pruning. Five to 25 covariates were generated in multivariate clusters, and responses were obtained from an ordinary linear regression model involving three of the covariates; each data set had 50 observations. The regression-tree approaches were markedly inferior to the other methods in discriminating between informative and noninformative covariates, and their predictions of responses in “new” data sets were much more variable and less accurate than those of the other methods. The F-test, C p and BIC approaches were similar in their overall frequencies of “correct” decisions about inclusion or exclusion of covariates, with the C p method leading to the largest models and the BIC method to the smallest, The three methods were also comparable in their ability to predict “new” observations, with perhaps a tendency for the C p approach to perform relatively more poorly for large covariate pools. The abilities of all methods to discriminate between informative and noninformative covariates and to predict “new” observations decreased with increasing size of the covariate pool.Keywords
This publication has 13 references indexed in Scilit:
- MULTIVARIABLE PROGNOSTIC MODELS: ISSUES IN DEVELOPING MODELS, EVALUATING ASSUMPTIONS AND ADEQUACY, AND MEASURING AND REDUCING ERRORSStatistics in Medicine, 1996
- Bayes FactorsJournal of the American Statistical Association, 1995
- AIC Model Selection in Overdispersed Capture‐Recapture DataEcology, 1994
- Submodel Selection and Evaluation in Regression. The X-Random CaseInternational Statistical Review, 1992
- Backward, forward and stepwise automated subset selection algorithms: Frequency of obtaining authentic and noise variablesBritish Journal of Mathematical and Statistical Psychology, 1992
- Predicting crustacean zooplankton species richnessLimnology and Oceanography, 1992
- Model Specification: The Views of Fisher and Neyman, and Later DevelopmentsStatistical Science, 1990
- Role of Models in Statistical AnalysisStatistical Science, 1990
- Estimating the Dimension of a ModelThe Annals of Statistics, 1978
- Regressions by Leaps and BoundsTechnometrics, 1974