Non‐random error in genotype calling procedures: Implications for family‐based and case–control genome‐wide association studies
Open Access
- 28 July 2008
- journal article
- research article
- Published by Wiley in American Journal Of Medical Genetics Part B-Neuropsychiatric Genetics
- Vol. 147B (8) , 1379-1386
- https://doi.org/10.1002/ajmg.b.30836
Abstract
The considerable data‐handling requirements for genome wide association studies (GWAS) prohibit individual calling of genotypes and create a reliance on sophisticated “genotype‐calling algorithms.” Despite their obvious utility, the current genotyping platforms and calling‐algorithms used are not without their limitations. Specifically, some genotypes are not called due to the ambiguity of the data. Any bias in the missing data could create spurious results. Using data from the Genetic Analysis Information Network (GAIN) we observed that missing genotypes are not randomly distributed throughout the homozygous and heterozygous groups. Using simulation, we examined whether the level and type of missingness observed might influence deviation from the null‐hypothesis under common case–control and family‐based statistical approaches. Under a case–control model, where missingness is present in a case group but not the controls, we observed bias giving rise to genome‐wide significant type‐I error for missingness as low as 3%. The family‐based association simulations show close to nominal type‐I error at 4% genotype missingness. These findings have important implications to study design, quality‐control procedures and reporting of findings in GWAS.Keywords
This publication has 8 references indexed in Scilit:
- Association scan of 14,500 nonsynonymous SNPs in four diseases identifies autoimmunity variantsNature Genetics, 2007
- PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage AnalysesAmerican Journal of Human Genetics, 2007
- New models of collaboration in genome-wide association studies: the Genetic Association Information NetworkNature Genetics, 2007
- Guilt beyond a reasonable doubt.Nature Genetics, 2007
- A new multipoint method for genome-wide association studies by imputation of genotypesNature Genetics, 2007
- Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controlsNature, 2007
- Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array dataBiostatistics, 2006
- A genotype calling algorithm for affymetrix SNP arraysBioinformatics, 2005