STATISTICAL ANALYSIS OF HETEROZYGOSITY DATA: INDEPENDENT SAMPLE COMPARISONS
- 31 May 1985
- Vol. 39 (3) , 623-637
- https://doi.org/10.1111/j.1558-5646.1985.tb00399.x
Abstract
The distribution of mean heterozygosities under an infinite allele model with constant mutation rate was examined through simulation studies. It was found that, although the variance of the distribution decreases with increasing numbers of loci examined as expected, the shape of the distribution may remain skewed or bimodal. The distribution becomes symmetrical for increasing mean heterozygosity levels and numbers of loci. As a result, parametric statistical tests may not be valid for making comparisons among populations or species. Independent sample t-tests were examined in detail to determine the frequency of rejection of the null hypothesis when pairs of samples are drawn from populations with the same mean heterozygosity. Differing numbers of loci and levels of mean heterozygosity were examined. For mean heterozygosity levels above 7.5%, t-tests provide the proper rejection rate, with as few as five loci. When mean heterozygosity is as low as 2.5%, the t-test is conservative even when 40 loci are examined in each population. Independent sample t-tests were then examined for their power to detect true differences between populations as the degree of difference and number of loci vary. Although large differences can be found with high certainty, differences on the order of 5% heterozygosity may require that large numbers of loci (>40) be examined in order to be 80% or more certain of detecting them. In addition, it is emphasized that, for small numbers of loci (<25), the statistical detection of differences of interesting magnitude requires that relatively rare sampling events occur and that much larger differences be observed among the samples than exist for the population means. Two reasons exist for the lack of sensitivity of the test procedures. First, when mean heterozygosity levels are low, the non-normality of the sample means is perhaps most important. Second, even when mean heterozygosity levels are high or when sample sizes are large enough so sample means are approximately normally distributed, the intrinsically high interlocus variance of heterozygosity estimates makes the tests insensitive to the presence of heterozygosity differences that might be biologically meaningful. Finally, the implications of the results of this study are discussed with regard to observed low levels of correlation between heterozygosity and other explanatory variables.Funding Information
- National Science Foundation (DEB 81‐070138)
This publication has 29 references indexed in Scilit:
- A Practical Genome Scan for Population-Specific Strong Selective Sweeps That Have Reached FixationPLOS ONE, 2007
- Automated binning of microsatellite alleles: problems and solutionsMolecular Ecology Notes, 2006
- Genetic Diversity and Population Structure in Pitch Pine (Pinus rigida Mill.)Evolution, 1982
- Relationships Between Life History Characteristics and Electrophoretically Detectable Genetic Variation in PlantsAnnual Review of Ecology and Systematics, 1979
- On the distribution of allele frequencies in a diffusion modelTheoretical Population Biology, 1979
- Genetic variation in natural populations: Patterns and theoryTheoretical Population Biology, 1978
- Variability in the amount of heterozygosity maintained by neutral mutationsTheoretical Population Biology, 1976
- Analysis of Gene Diversity in Subdivided PopulationsProceedings of the National Academy of Sciences, 1973
- A model of mutation appropriate to estimate the number of electrophoretically detectable alleles in a finite populationGenetics Research, 1973
- The sampling theory of selectively neutral allelesTheoretical Population Biology, 1972