The computer program STRUCTURE does not reliably identify the main genetic clusters within species: simulations and implications for human population structure
- 4 August 2010
- journal article
- Published by Springer Nature in Heredity
- Vol. 106 (4) , 625-632
- https://doi.org/10.1038/hdy.2010.95
Abstract
One of the primary goals of population genetics is to succinctly describe genetic relationships among populations, and the computer program STRUCTURE is one of the most frequently used tools for doing so. The mathematical model used by STRUCTURE was designed to sort individuals into Hardy–Weinberg populations, but the program is also frequently used to group individuals from a large number of populations into a small number of clusters that are supposed to represent the main genetic divisions within species. In this study, I used computer simulations to examine how well STRUCTURE accomplishes this latter task. Simulations of populations that had a simple hierarchical history of fragmentation showed that when there were relatively long divergence times within evolutionary lineages, the clusters created by STRUCTURE were frequently not consistent with the evolutionary history of the populations. These difficulties can be attributed to forcing STRUCTURE to place individuals into too few clusters. Simulations also showed that the clusters produced by STRUCTURE can be strongly influenced by variation in sample size. In some circumstances, STRUCTURE simply put all of the individuals from the largest sample in the same cluster. A reanalysis of human population structure suggests that the problems I identified with STRUCTURE in simulations may have obscured relationships among human populations—particularly genetic similarity between Europeans and some African populations.Keywords
This publication has 31 references indexed in Scilit:
- Inferring weak population structure with the assistance of sample group informationMolecular Ecology Resources, 2009
- The Genetic Structure and History of Africans and African AmericansScience, 2009
- Enhanced Bayesian modelling in BAPS software for learning genetic structures of populationsBMC Bioinformatics, 2008
- Genes mirror geography within EuropeNature, 2008
- Tree-guided Bayesian inference of population structuresBioinformatics, 2008
- Maternal traces of deep common ancestry and asymmetric gene flow between Pygmy hunter–gatherers and Bantu-speaking farmersProceedings of the National Academy of Sciences, 2008
- Population Structure and EigenanalysisPLoS Genetics, 2006
- Clines, Clusters, and the Effect of Study Design on the Inference of Human Population StructurePLoS Genetics, 2005
- Geography is a better determinant of human genetic differentiation than ethnicityHuman Genetics, 2005
- Detecting the number of clusters of individuals using the software structure: a simulation studyMolecular Ecology, 2005