Reduction of selection bias in genomewide studies by resampling
- 10 March 2005
- journal article
- research article
- Published by Wiley in Genetic Epidemiology
- Vol. 28 (4) , 352-367
- https://doi.org/10.1002/gepi.20068
Abstract
The accuracy of gene localization, the reliability of locus-specific effect estimates, and the ability to replicate initial claims of linkage and/or association have emerged as major methodological concerns in genomewide studies of complex diseases and quantitative traits. To address the issue of multiple comparisons inherent in genomewide studies, the use of stringent criteria for assessing statistical significance has been generally acknowledged as a strategy to control type I error. However, the application of genomewide significance criteria does not take account of the selection bias introduced into parameter estimates, e.g., estimates of locus-specific effect size of disease/trait loci. Some have argued that reliable locus-specific parameter estimates can only be obtained in an independent sample. In this report, we examine statistical resampling techniques, including cross-validation and the bootstrap, applied to the initial sample to improve the estimation of locus-specific effects. We compare them with the naïve method in which all data are used for both hypothesis testing and parameter estimation, as well as with the split-sample approach in which part of the data are reserved for estimation. Upward bias of the naïve estimator and inadequacy of the split-sample approach are derived analytically under a simple quantitative trait model. Simulation studies of the resampling methods are performed for both the simple model and a more realistic genomewide linkage analysis. Our results suggest that cross-validation and bootstrap methods can substantially reduce the estimation bias, especially when the effect size is small or there is no genetic effect. Genet. Epidemiol.Keywords
This publication has 12 references indexed in Scilit:
- Upward Bias in Estimation of Genetic EffectsAmerican Journal of Human Genetics, 2002
- Bias in Estimates of Quantitative-Trait–Locus Effect in Genome Scans: Demonstration of the Phenomenon and a Method-of-Moments Procedure for Reducing BiasAmerican Journal of Human Genetics, 2002
- Large Upward Bias in Estimation of Locus-Specific Effects from Genomewide ScansAmerican Journal of Human Genetics, 2001
- Haseman and Elston revisitedGenetic Epidemiology, 2000
- Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data setsStatistics in Medicine, 2000
- Multipoint Quantitative-Trait Linkage Analysis in General PedigreesAmerican Journal of Human Genetics, 1998
- Allele-Sharing Models: LOD Scores and Accurate Linkage TestsAmerican Journal of Human Genetics, 1997
- Improvements on Cross-Validation: The .632+ Bootstrap MethodJournal of the American Statistical Association, 1997
- Genetic dissection of complex traits: guidelines for interpreting and reporting linkage resultsNature Genetics, 1995
- Estimating the Error Rate of a Prediction Rule: Improvement on Cross-ValidationJournal of the American Statistical Association, 1983