Abstract
Following Box and Cox (1964), the use of transformations in regression analysis is now common; recently there has been emphasis on diagnostic methods for transformation, much of which has involved deletion of data cases. Summaries were given by Cook and Weisberg (1982) and Atkinson (1985). This article obtains diagnostics for the estimated regression parameter of the Box-Cox transformation of the response variable in the linear model. Instead of deleting cases, the more general notions of perturbing assumptions of the model, or components of the data, are employed, as in the local-influence approach of Cook (1986). The diagnostics then arise from local changes to the transformation parameter estimate caused by small perturbations; the case direction in which small perturbations have the greatest effect is the main diagnostic quantity. An appeal of the approach is that it allows simultaneous perturbations affecting all data cases, not just one-at-a-time deletions; it can thus point to groups of influential cases, giving some local indications of possible masking effects. These are usually said to occur when single deletions produce small changes in parameter estimates, whereas deletions of pairs or small groups of cases cause large changes. Any outlying direction cosines with similar signs are indications of cases possibly associated with masking. In the transformation problem, diagnostics are first obtained from perturbing the constant model variances, a general way of detecting case sensitivity. The diagnostics are shown to be functions of the residuals after transformation and their derivatives with respect to the transformation parameter, a second set of residuals. By allowing data perturbations, the approach can also be used to produce more specific diagnostics directed at sensitive values in either the response or explanatory data. The methods are illustrated on the poison data originally used by Box and Cox (1964). General sensitivity is attributed to two particular pairs; their deletion changes the maximum likelihood estimate of the transformation parameter from – .75 to – .55 and – .97, respectively. Only one of these pairs is sensitive to its response values alone being perturbed. In this instance, as in general, diagnostics cannot explain (in subject-matter terms) the reasons for the influential cases. Their purpose is to alert the investigator to possible difficulties with the data in relation to the model being fitted. Early notions of this work were reported in the author's discussion contribution to Cook (1986).

This publication has 0 references indexed in Scilit: