Quantifying epidemiologic risk factors using non‐parametric regression: model selection remains the greatest challenge
- 10 October 2003
- journal article
- research article
- Published by Wiley in Statistics in Medicine
- Vol. 22 (21) , 3369-3381
- https://doi.org/10.1002/sim.1638
Abstract
Logistic regression is widely used to estimate relative risks (odds ratios) from case–control studies, but when the study exposure is continuous, standard parametric models may not accurately characterize the exposure–response curve. Semi‐parametric generalized linear models provide a useful extension. In these models, the exposure of interest is modelled flexibly using a regression spline or a smoothing spline, while other variables are modelled using conventional methods. When coupled with a model‐selection procedure based on minimizing a cross‐validation score, this approach provides a non‐parametric, objective, and reproducible method to characterize the exposure–response curve by one or several models with a favourable bias–variance trade‐off. We applied this approach to case–control data to estimate the dose–response relationship between alcohol consumption and risk of oral cancer among African Americans. We did not find a uniquely ‘best’ model, but results using linear, cubic, and smoothing splines were consistent: there does not appear to be a risk‐free threshold for alcohol consumptionvis‐à‐visthe development of oral cancer. This finding was not apparent using a standard step‐function model. In our analysis, the cross‐validation curve had a global minimum and also a local minimum. In general, the phenomenon of multiple local minima makes it more difficult to interpret the results, and may present a computational roadblock to non‐parametric generalized additive models of multiple continuous exposures. Nonetheless, the semi‐parametric approach appears to be a practical advance. Published in 2003 by John Wiley & Sons, Ltd.Keywords
This publication has 18 references indexed in Scilit:
- Application of Nonparametric Models for Calculating Odds Ratios and Their Confidence Intervals for Continuous ExposuresAmerican Journal of Epidemiology, 2001
- RE: "PRESENTING STATISTICAL UNCERTAINTY IN TRENDS AND DOSE-RESPONSE RELATIONS"American Journal of Epidemiology, 2000
- Some reflections on the beginnings and development of statistics in "Your Father's NIH"Statistical Science, 1997
- Flexible Modeling of the Effects of Serum Cholesterol on Coronary Heart Disease MortalityAmerican Journal of Epidemiology, 1997
- Dose-Response and Trend Analysis in EpidemiologyEpidemiology, 1995
- How Bad Is Categorization?Epidemiology, 1995
- Racial Differences in Risk of Oral and Pharyngeal Cancer: Alcohol, Tobacco, and Other DeterminantsJNCI Journal of the National Cancer Institute, 1993
- Smoothing noisy data with spline functionsNumerische Mathematik, 1985
- Splines As a Useful and Convenient Statistical ToolThe American Statistician, 1979
- Smoothing by spline functionsNumerische Mathematik, 1967