The Impact of Missing and Erroneous Genotypes on Tagging SNP Selection and Power of Subsequent Association Tests
- 1 April 2006
- journal article
- Published by S. Karger AG in Human Heredity
- Vol. 61 (1) , 31-44
- https://doi.org/10.1159/000092141
Abstract
Single nucleotide polymorphisms (SNPs) serve as effective markers for localizing disease susceptibility genes, but current genotyping technologies are inadequate for genotyping all available SNP markers in a typical linkage/association study. Much attention has recently been paid to methods for selecting the minimal informative subset of SNPs in identifying haplotypes, but there has been little investigation of the effect of missing or erroneous genotypes on the performance of these SNP selection algorithms and subsequent association tests using the selected tagging SNPs. The purpose of this study is to explore the effect of missing genotype or genotyping error on tagging SNP selection and subsequent single marker and haplotype association tests using the selected tagging SNPs. Through two sets of simulations, we evaluated the performance of three tagging SNP selection programs in the presence of missing or erroneous genotypes: Clayton's diversity based program htstep, Carlson's linkage disequilibrium (LD) based program ldSelect, and Stram's coefficient of determination based program tagsnp.exe. When randomly selected known loci were relabeled as 'missing', we found that the average number of tagging SNPs selected by all three algorithms changed very little and the power of subsequent single marker and haplotype association tests using the selected tagging SNPs remained close to the power of these tests in the absence of missing genotype. When random genotyping errors were introduced, we found that the average number of tagging SNPs selected by all three algorithms increased. In data sets simulated according to the haplotype frequecies in the CYP19 region, Stram's program had larger increase than Carlson's and Clayton's programs. In data sets simulated under the coalescent model, Carlson's program had the largest increase and Clayton's program had the smallest increase. In both sets of simulations, with the presence of genotyping errors, the power of the haplotype tests from all three programs decreased quickly, but there was not much reduction in power of the single marker tests. Missing genotypes do not seem to have much impact on tagging SNP selection and subsequent single marker and haplotype association tests. In contrast, genotyping errors could have severe impact on tagging SNP selection and haplotype tests, but not on single marker tests.Keywords
This publication has 28 references indexed in Scilit:
- A haplotype map of the human genomeNature, 2005
- Selecting Tagging SNPs for Association Studies Using Power Calculations from Genotype DataHuman Heredity, 2004
- Incorporating Genotyping Uncertainty in Haplotype Inference for Single-Nucleotide PolymorphismsAmerican Journal of Human Genetics, 2004
- Incorporating Individual Error Rate into Association Test of Unmatched Case-Control DesignHuman Heredity, 2004
- Entropy-based SNP selection for genetic association studiesHuman Genetics, 2003
- The impact of genotyping error on haplotype reconstruction and frequency estimationEuropean Journal of Human Genetics, 2002
- Beyond Mendel: an evolving view of human genetic disease transmissionNature Reviews Genetics, 2002
- Score Tests for Association between Traits and Haplotypes when Linkage Phase Is AmbiguousAmerican Journal of Human Genetics, 2002
- The impact of genotyping error on family-based analysis of quantitative traitsEuropean Journal of Human Genetics, 2001
- A Multipoint Method for Detecting Genotyping Errors and Mutations in Sibling-Pair Linkage DataAmerican Journal of Human Genetics, 2000