Selection of Variables for Fitting Equations to Data

Abstract
Selecting a suitable equation to represent a set of multifactor data that was collected for other purposes in a plant, pilot-plant, or laboratory can be troublesome. If there are k independent variables, there are 2 k possible linear equations to be examined; one equation using none of the variables, k using one variable, k(k – 1)/2 using two variables, etc. Often there are several equally good candidates. Selection depends on whether one needs a simple interpolation formula or estimates of the effects of individual independent variables. Fractional factorial designs for sampling the 2 k possibilities and a new statistic proposed by C. Mallows simplify the search for the best candidate. With the new statistic, regression equations can be compared graphically with respect to both bias and random error.