On the Cost of Data Analysis
- 1 September 1992
- journal article
- research article
- Published by Taylor & Francis in Journal of Computational and Graphical Statistics
- Vol. 1 (3) , 213-229
- https://doi.org/10.1080/10618600.1992.10474582
Abstract
A regression analysis usually consists of several stages, such as variable selection, transformation and residual diagnosis. Inference is often made from the selected model without regard to the model selection methods that preceeded it. This can result in overoptimistic and biased inferences. We first characterize data-analytic actions as functions acting on regression models. We investigate the extent of the problem and test bootstrap, jackknife, and sample-splitting methods for ameliorating it. We also demonstrate an interactive LISP-STAT system for assessing the cost of the data analysis while it is taking place.Keywords
This publication has 22 references indexed in Scilit:
- Prediction and Tolerance Intervals With Transformation and/or WeightingTechnometrics, 1991
- The Cost of Generalizing Logistic RegressionJournal of the American Statistical Association, 1988
- The Effect of Estimating Weights in Weighted Least SquaresJournal of the American Statistical Association, 1988
- Variance Function EstimationJournal of the American Statistical Association, 1987
- Cross-Validation, the Jackknife, and the Bootstrap: Excess Error Estimation in Forward Logistic RegressionJournal of the American Statistical Association, 1986
- Cross-Validation of Regression ModelsJournal of the American Statistical Association, 1984
- Power Transformations When Fitting Theoretical Models to DataJournal of the American Statistical Association, 1984
- Bootstrapping a Regression Equation: Some Empirical ResultsJournal of the American Statistical Association, 1984
- Statistical Tests Based on Transformed DataJournal of the American Statistical Association, 1983
- An Analysis of Transformations RevisitedJournal of the American Statistical Association, 1981