CLUMPP: a cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure
Top Cited Papers
Open Access
- 7 May 2007
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 23 (14) , 1801-1806
- https://doi.org/10.1093/bioinformatics/btm233
Abstract
Motivation: Clustering of individuals into populations on the basis of multilocus genotypes is informative in a variety of settings. In population-genetic clustering algorithms, such as BAPS, STRUCTURE and TESS, individual multilocus genotypes are partitioned over a set of clusters, often using unsupervised approaches that involve stochastic simulation. As a result, replicate cluster analyses of the same data may produce several distinct solutions for estimated cluster membership coefficients, even though the same initial conditions were used. Major differences among clustering solutions have two main sources: (1) ‘label switching’ of clusters across replicates, caused by the arbitrary way in which clusters in an unsupervised analysis are labeled, and (2) ‘genuine multimodality,’ truly distinct solutions across replicates. Results: To facilitate the interpretation of population-genetic clustering results, we describe three algorithms for aligning multiple replicate analyses of the same data set. We have implemented these algorithms in the computer program CLUMPP (CLUster Matching and Permutation Program). We illustrate the use of CLUMPP by aligning the cluster membership coefficients from 100 replicate cluster analyses of 600 chickens from 20 different breeds. Availability:CLUMPP is freely available at http://rosenberglab.bioinformatics.med.umich.edu/clumpp.html Contact:mjakob@umich.eduKeywords
This publication has 19 references indexed in Scilit:
- Inference of Population Structure Under a Dirichlet Process ModelGenetics, 2007
- fastruct: model‐based clustering made fasterMolecular Ecology Notes, 2006
- Bayesian Clustering Using Hidden Markov Random Fields in Spatial Population GeneticsGenetics, 2006
- The Gibbs and split–merge sampler for population mixture analysis from genetic data with incomplete baselinesCanadian Journal of Fisheries and Aquatic Sciences, 2006
- The Pattern of Polymorphism in Arabidopsis thalianaPLoS Biology, 2005
- Detecting the number of clusters of individuals using the software structure: a simulation studyMolecular Ecology, 2005
- Markov Chain Monte Carlo Methods and the Label Switching Problem in Bayesian Mixture ModelingStatistical Science, 2005
- distruct: a program for the graphical display of population structureMolecular Ecology Notes, 2003
- Genetic Structure of Human PopulationsScience, 2002
- Dealing With Label Switching in Mixture ModelsJournal of the Royal Statistical Society Series B: Statistical Methodology, 2000