Appropriate data cleaning methods for genome-wide association study
Open Access
- 12 August 2008
- journal article
- research article
- Published by Springer Nature in Journal of Human Genetics
- Vol. 53 (10) , 886-893
- https://doi.org/10.1007/s10038-008-0322-y
Abstract
Genome-wide association studies (GWAS) using a large number of single nucleotide polymorphisms (SNPs) have successfully been applied to identify genetic variants of common diseases. However, genotyping using the new array technologies is often associated with spurious results that could unfavorably affect analyses of GWAS. Consequently, data cleaning is of paramount importance in excluding spurious genotyping results. In this study, we investigated the criteria required for the appropriate cleaning of 389 unrelated healthy Japanese samples analyzed using the GeneChip Human Mapping 500K Array Set for GWAS. The samples were randomly subdivided into two groups, and the allele frequencies in the groups were compared for individual SNPs as a quasi-case-control study. Then, observed results were filtered by four parameters (SNP call rate, confidence score obtained using the Bayesian Robust Linear Model with Mahalanobis genotype-calling algorithm, Hardy-Weinberg equilibrium, and minor allele frequency) and assessed for deviation from the null hypothesis. We found that appropriate data cleaning could be achieved using these four parameters. Our findings offer an avenue for obtaining appropriate data from GWAS.Keywords
This publication has 23 references indexed in Scilit:
- Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controlsNature, 2007
- Genome-wide association study identifies new susceptibility loci for Crohn disease and implicates autophagy in disease pathogenesisNature Genetics, 2007
- A genome-wide association study identifies novel risk loci for type 2 diabetesNature, 2007
- A nonsynonymous SNP in PRKCH (protein kinase C η) increases the risk of cerebral infarctionNature Genetics, 2007
- A tutorial on statistical methods for population association studiesNature Reviews Genetics, 2006
- Primate segmental duplications: crucibles of evolution, diversity and diseaseNature Reviews Genetics, 2006
- A common genetic variant in the NOS1 regulator NOS1AP modulates cardiac repolarizationNature Genetics, 2006
- A high-resolution survey of deletion polymorphism in the human genomeNature Genetics, 2005
- Complement Factor H Polymorphism in Age-Related Macular DegenerationScience, 2005
- Detecting Marker-Disease Association by Testing for Hardy-Weinberg Disequilibrium at a Marker LocusAmerican Journal of Human Genetics, 1998