An Arabidopsis Example of Association Mapping in Structured Samples
Top Cited Papers
Open Access
- 1 January 2007
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLoS Genetics
- Vol. 3 (1) , e4
- https://doi.org/10.1371/journal.pgen.0030004
Abstract
A potentially serious disadvantage of association mapping is the fact that marker-trait associations may arise from confounding population structure as well as from linkage to causative polymorphisms. Using genome-wide marker data, we have previously demonstrated that the problem can be severe in a global sample of 95 Arabidopsis thaliana accessions, and that established methods for controlling for population structure are generally insufficient. Here, we use the same sample together with a number of flowering-related phenotypes and data-perturbation simulations to evaluate a wider range of methods for controlling for population structure. We find that, in terms of reducing the false-positive rate while maintaining statistical power, a recently introduced mixed-model approach that takes genome-wide differences in relatedness into account via estimated pairwise kinship coefficients generally performs best. By combining the association results with results from linkage mapping in F2 crosses, we identify one previously known true positive and several promising new associations, but also demonstrate the existence of both false positives and false negatives. Our results illustrate the potential of genome-wide association scans as a tool for dissecting the genetics of natural variation, while at the same time highlighting the pitfalls. The importance of study design is clear; our study is severely under-powered both in terms of sample size and marker density. Our results also provide a striking demonstration of confounding by population structure. While statistical methods can be used to ameliorate this problem, they cannot always be effective and are certainly not a substitute for independent evidence, such as that obtained via crosses or transgenic experiments. Ultimately, association mapping is a powerful tool for identifying a list of candidates that is short enough to permit further genetic study. There is currently tremendous interest in using association mapping to find the genes responsible for natural variation, particularly for human disease. In association mapping, researchers seek to identify regions of the genome where individuals who are phenotypically similar (e.g., they all have the same disease) are also unusually closely related. A potentially serious problem is that spurious correlations may arise if the population is structured so that members of a subgroup tend to be much more closely related. We have previously demonstrated that this problem can be severe in Arabidopsis thaliana, and that established statistical methods for controlling for population structure are insufficient. Here, we evaluate a broader range of methods. We find that a recently introduced mixed-model approach generally performs best. By combining the association results with results from linkage mapping in F2 crosses, we identify one previously known true positive and several promising new associations, but also demonstrate the existence of both false positives and false negatives. Our results illustrate the potential of genome-wide association scans as a tool for dissecting the genetics of natural variation, while at the same time highlighting the pitfalls.Keywords
This publication has 49 references indexed in Scilit:
- Variation in the epigenetic silencing of FLC contributes to natural variation in Arabidopsis vernalization responseGenes & Development, 2006
- Principal components analysis corrects for stratification in genome-wide association studiesNature Genetics, 2006
- The PHYTOCHROME C photoreceptor gene mediates natural variation in flowering and growth responses of Arabidopsis thalianaNature Genetics, 2006
- A unified mixed-model method for association mapping that accounts for multiple levels of relatednessNature Genetics, 2005
- Genome-Wide Association Mapping in Arabidopsis Identifies Previously Known Flowering Time and Pathogen Resistance GenesPLoS Genetics, 2005
- A haplotype map of the human genomeNature, 2005
- Confounding from Cryptic Relatedness in Case-Control Association StudiesPLoS Genetics, 2005
- Diversity of Flowering Responses in Wild Arabidopsis thaliana StrainsPLoS Genetics, 2005
- The Pattern of Polymorphism in Arabidopsis thalianaPLoS Biology, 2005
- Analysis of the Molecular Basis of Flowering Time Variation in Arabidopsis AccessionsPlant Physiology, 2003