Abstract
When a regression problem contains many predictor variables, it is rarely wise to try to fit the data by means of a least squares regression on all of the predictor variables. Usually, a regression equation based on a few variables will be more accurate and certainly simpler. There are various methods for picking “good” subsets of variables, and programs that do such procedures are part of every widely used statistical package. The most common methods are based on stepwise addition or deletion of variables and on “best subsets.” The latter refers to a search method that, given the number of variables to be in the equation (say, five), locates that regression equation based on five variables that has the lowest residual sum of squares among all five variable equations. All of these procedures generate a sequence of regression equations, the first based on one variable, the next based on two variables, and so on. Each member of this sequence is called a submodel and the number of variables in the e...

This publication has 0 references indexed in Scilit: