The Impact of Missing and Erroneous Genotypes on Tagging SNP Selection and Power of Subsequent Association Tests

1 April 2006

journal article
Published by S. Karger AG in Human Heredity

Vol. 61 (1) , 31-44
https://doi.org/10.1159/000092141

Abstract

Single nucleotide polymorphisms (SNPs) serve as effective markers for localizing disease susceptibility genes, but current genotyping technologies are inadequate for genotyping all available SNP markers in a typical linkage/association study. Much attention has recently been paid to methods for selecting the minimal informative subset of SNPs in identifying haplotypes, but there has been little investigation of the effect of missing or erroneous genotypes on the performance of these SNP selection algorithms and subsequent association tests using the selected tagging SNPs. The purpose of this study is to explore the effect of missing genotype or genotyping error on tagging SNP selection and subsequent single marker and haplotype association tests using the selected tagging SNPs. Through two sets of simulations, we evaluated the performance of three tagging SNP selection programs in the presence of missing or erroneous genotypes: Clayton's diversity based program htstep, Carlson's linkage disequilibrium (LD) based program ldSelect, and Stram's coefficient of determination based program tagsnp.exe. When randomly selected known loci were relabeled as 'missing', we found that the average number of tagging SNPs selected by all three algorithms changed very little and the power of subsequent single marker and haplotype association tests using the selected tagging SNPs remained close to the power of these tests in the absence of missing genotype. When random genotyping errors were introduced, we found that the average number of tagging SNPs selected by all three algorithms increased. In data sets simulated according to the haplotype frequecies in the CYP19 region, Stram's program had larger increase than Carlson's and Clayton's programs. In data sets simulated under the coalescent model, Carlson's program had the largest increase and Clayton's program had the smallest increase. In both sets of simulations, with the presence of genotyping errors, the power of the haplotype tests from all three programs decreased quickly, but there was not much reduction in power of the single marker tests. Missing genotypes do not seem to have much impact on tagging SNP selection and subsequent single marker and haplotype association tests. In contrast, genotyping errors could have severe impact on tagging SNP selection and haplotype tests, but not on single marker tests.

Keywords

This publication has 28 references indexed in Scilit:

A haplotype map of the human genome
Nature, 2005
Selecting Tagging SNPs for Association Studies Using Power Calculations from Genotype Data
Human Heredity, 2004
Incorporating Genotyping Uncertainty in Haplotype Inference for Single-Nucleotide Polymorphisms
American Journal of Human Genetics, 2004
Incorporating Individual Error Rate into Association Test of Unmatched Case-Control Design
Human Heredity, 2004
Entropy-based SNP selection for genetic association studies
Human Genetics, 2003
The impact of genotyping error on haplotype reconstruction and frequency estimation
European Journal of Human Genetics, 2002
Beyond Mendel: an evolving view of human genetic disease transmission
Nature Reviews Genetics, 2002
Score Tests for Association between Traits and Haplotypes when Linkage Phase Is Ambiguous
American Journal of Human Genetics, 2002
The impact of genotyping error on family-based analysis of quantitative traits
European Journal of Human Genetics, 2001
A Multipoint Method for Detecting Genotyping Errors and Mutations in Sibling-Pair Linkage Data
American Journal of Human Genetics, 2000