The Gibbs and split–merge sampler for population mixture analysis from genetic data with incomplete baselines
- 1 March 2006
- journal article
- Published by Canadian Science Publishing in Canadian Journal of Fisheries and Aquatic Sciences
- Vol. 63 (3) , 576-596
- https://doi.org/10.1139/f05-224
Abstract
Although population mixtures often include contributions from novel populations as well as from baseline populations previously sampled, unlabeled mixture individuals can be separated to their sources from genetic data. A Gibbs and split–merge Markov chain Monte Carlo sampler is described for successively partitioning a genetic mixture sample into plausible subsets of individuals from each of the baseline and extra-baseline populations present. The subsets are selected to satisfy the Hardy–Weinberg and linkage equilibrium conditions expected for large, panmictic populations. The number of populations present can be inferred from the distribution for counts of subsets per partition drawn by the sampler. To further summarize the sampler's output, co-assignment probabilities of mixture individuals to the same subsets are computed from the partitions and are used to construct a binary tree of their relatedness. The tree graphically displays the clusters of mixture individuals together with a quantitative measure of the evidence supporting their various separate and common sources. The methodology is applied to several simulated and real data sets to illustrate its use and demonstrate the sampler's superior performance.Keywords
This publication has 27 references indexed in Scilit:
- A Split-Merge Markov chain Monte Carlo Procedure for the Dirichlet Process Mixture ModelJournal of Computational and Graphical Statistics, 2004
- Inference of Population Structure Using Multilocus Genotype Data: Linked Loci and Correlated Allele FrequenciesGenetics, 2003
- A Bayesian approach to the identification of panmictic populations and the assignment of individualsGenetics Research, 2001
- Computer note. SPAM (version 3.2): statistics program for analyzing mixturesJournal of Heredity, 2000
- Computational and Inferential Difficulties with Mixture Posterior DistributionsJournal of the American Statistical Association, 2000
- Application of microsatellite DNA variation to estimation of stock composition and escapement of Nass River sockeye salmon (Oncorhynchus nerka)Canadian Journal of Fisheries and Aquatic Sciences, 1999
- Inference in model-based cluster analysisStatistics and Computing, 1997
- Estimating Stock Composition in Mixed Stock Fisheries Using Morphometric, Meristic, and Electrophoretic CharacteristicsCanadian Journal of Fisheries and Aquatic Sciences, 1984
- Ferguson Distributions Via Polya Urn SchemesThe Annals of Statistics, 1973
- The sampling theory of selectively neutral allelesTheoretical Population Biology, 1972