Modeling and variable selection in epidemiologic analysis.
- 1 March 1989
- journal article
- Published by American Public Health Association in American Journal of Public Health
- Vol. 79 (3) , 340-349
- https://doi.org/10.2105/ajph.79.3.340
Abstract
This paper provides an overview of problems in multivariate modeling of epidemiologic data, and examines some proposed solutions. Special attention is given to the task of model selection, which involves selection of the model form, selection of the variables to enter the model, and selection of the form of these variables in the model. Several conclusions are drawn, among them: a) model and variable forms should be selected based on regression diagnostic procedures, in addition to goodness-of-fit tests; b) variable-selection algorithms in current packaged programs, such as conventional stepwise regression, can easily lead to invalid estimates and tests of effect; and c) variable selection is better approached by direct estimation of the degree of confounding produced by each variable than by significance-testing algorithms. As a general rule, before using a model to estimate effects, one should evaluate the assumptions implied by the model against both the data and prior information.Keywords
This publication has 49 references indexed in Scilit:
- Computing Distributions for Exact Logistic RegressionJournal of the American Statistical Association, 1987
- Small Sample Properties of Probit Model EstimatorsJournal of the American Statistical Association, 1987
- Large sample confidence intervals for regression standardized risks, risk ratios, and risk differencesJournal of Chronic Diseases, 1987
- A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effectMathematical Modelling, 1986
- Generalization of the Mantel-Haenszel Estimator to Nonconstant Odds RatiosPublished by JSTOR ,1985
- Tests for interaction in epidemiologic studies: A review and a study of powerStatistics in Medicine, 1983
- Multivariate estimation of exposure-specific incidence from case-control studiesJournal of Chronic Diseases, 1981
- Asymptotic Properties of Maximum Likelihood Estimators Based on Conditional SpecificationThe Annals of Statistics, 1979
- An Improper Use of Statistical Significance Testing in Studying CovariablesInternational Journal of Epidemiology, 1978
- Hazards in the use of the logistic function with special reference to data from prospective cardiovascular studiesJournal of Chronic Diseases, 1974