Interpreting principal component analyses of spatial population genetic variation
Top Cited Papers
- 20 April 2008
- journal article
- research article
- Published by Springer Nature in Nature Genetics
- Vol. 40 (5) , 646-649
- https://doi.org/10.1038/ng.139
Abstract
Nearly 30 years ago, Cavalli-Sforza et al. pioneered the use of principal component analysis (PCA) in population genetics and used PCA to produce maps summarizing human genetic variation across continental regions1. They interpreted gradient and wave patterns in these maps as signatures of specific migration events1,2,3. These interpretations have been controversial4,5,6,7, but influential8, and the use of PCA has become widespread in analysis of population genetics data9,10,11,12,13. However, the behavior of PCA for genetic data showing continuous spatial variation, such as might exist within human continental groups, has been less well characterized. Here, we find that gradients and waves observed in Cavalli-Sforza et al.'s maps resemble sinusoidal mathematical artifacts that arise generally when PCA is applied to spatial data, implying that the patterns do not necessarily reflect specific migration events. Our findings aid interpretation of PCA results and suggest how PCA can help correct for continuous population structure in association studies.Keywords
This publication has 36 references indexed in Scilit:
- Palaeogenetic evidence supports a dual model of Neolithic spreading into EuropeProceedings Of The Royal Society B-Biological Sciences, 2007
- Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controlsNature, 2007
- Measuring European Population Stratification with Microarray Genotype DataAmerican Journal of Human Genetics, 2007
- An African origin for the intimate association between humans and Helicobacter pyloriNature, 2007
- An Arabidopsis Example of Association Mapping in Structured SamplesPLoS Genetics, 2007
- Principal components analysis corrects for stratification in genome-wide association studiesNature Genetics, 2006
- A General Population-Genetic Model for the Production by Population Structure of Spurious Genotype–Phenotype Associations in Discrete, Admixed or Spatially Distributed PopulationsGenetics, 2006
- Origins and evolution of the Europeans' genome: evidence from multiple microsatellite lociProceedings Of The Royal Society B-Biological Sciences, 2006
- Population Structure and EigenanalysisPLoS Genetics, 2006
- Tracing the Origin and Spread of Agriculture in EuropePLoS Biology, 2005