Leveraging Hierarchical Population Structure in Discrete Association Studies
Open Access
- 4 July 2007
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLOS ONE
- Vol. 2 (7) , e591
- https://doi.org/10.1371/journal.pone.0000591
Abstract
Population structure can confound the identification of correlations in biological data. Such confounding has been recognized in multiple biological disciplines, resulting in a disparate collection of proposed solutions. We examine several methods that correct for confounding on discrete data with hierarchical population structure and identify two distinct confounding processes, which we call coevolution and conditional influence. We describe these processes in terms of generative models and show that these generative models can be used to correct for the confounding effects. Finally, we apply the models to three applications: identification of escape mutations in HIV-1 in response to specific HLA-mediated immune pressure, prediction of coevolving residues in an HIV-1 peptide, and a search for genotypes that are associated with bacterial resistance traits in Arabidopsis thaliana. We show that coevolution is a better description of confounding in some applications and conditional influence is better in others. That is, we show that no single method is best for addressing all forms of confounding. Analysis tools based on these models are available on the internet as both web based applications and downloadable source code at http://atom.research.microsoft.com/bio/phylod.aspx.Keywords
This publication has 51 references indexed in Scilit:
- Principal components analysis corrects for stratification in genome-wide association studiesNature Genetics, 2006
- A unified mixed-model method for association mapping that accounts for multiple levels of relatednessNature Genetics, 2005
- Genome-Wide Association Mapping in Arabidopsis Identifies Previously Known Flowering Time and Pathogen Resistance GenesPLoS Genetics, 2005
- Confounding from Cryptic Relatedness in Case-Control Association StudiesPLoS Genetics, 2005
- Influence of conservation on calculations of amino acid covariance in multiple sequence alignmentsProteins-Structure Function and Bioinformatics, 2004
- Statistical significance for genomewide studiesProceedings of the National Academy of Sciences, 2003
- Evolutionarily conserved networks of residues mediate allosteric communication in proteinsNature Structural & Molecular Biology, 2002
- Plant pathogens and integrated defence responses to infectionNature, 2001
- Phylogenies and the Comparative MethodThe American Naturalist, 1985
- Evolutionary trees from DNA sequences: A maximum likelihood approachJournal of Molecular Evolution, 1981