STATISTICAL ANALYSIS OF HETEROZYGOSITY DATA: INDEPENDENT SAMPLE COMPARISONS

31 May 1985

journal article
research article
Published by Wiley in Evolution

Vol. 39 (3) , 623-637
https://doi.org/10.1111/j.1558-5646.1985.tb00399.x

Abstract

The distribution of mean heterozygosities under an infinite allele model with constant mutation rate was examined through simulation studies. It was found that, although the variance of the distribution decreases with increasing numbers of loci examined as expected, the shape of the distribution may remain skewed or bimodal. The distribution becomes symmetrical for increasing mean heterozygosity levels and numbers of loci. As a result, parametric statistical tests may not be valid for making comparisons among populations or species. Independent sample t-tests were examined in detail to determine the frequency of rejection of the null hypothesis when pairs of samples are drawn from populations with the same mean heterozygosity. Differing numbers of loci and levels of mean heterozygosity were examined. For mean heterozygosity levels above 7.5%, t-tests provide the proper rejection rate, with as few as five loci. When mean heterozygosity is as low as 2.5%, the t-test is conservative even when 40 loci are examined in each population. Independent sample t-tests were then examined for their power to detect true differences between populations as the degree of difference and number of loci vary. Although large differences can be found with high certainty, differences on the order of 5% heterozygosity may require that large numbers of loci (>40) be examined in order to be 80% or more certain of detecting them. In addition, it is emphasized that, for small numbers of loci (<25), the statistical detection of differences of interesting magnitude requires that relatively rare sampling events occur and that much larger differences be observed among the samples than exist for the population means. Two reasons exist for the lack of sensitivity of the test procedures. First, when mean heterozygosity levels are low, the non-normality of the sample means is perhaps most important. Second, even when mean heterozygosity levels are high or when sample sizes are large enough so sample means are approximately normally distributed, the intrinsically high interlocus variance of heterozygosity estimates makes the tests insensitive to the presence of heterozygosity differences that might be biologically meaningful. Finally, the implications of the results of this study are discussed with regard to observed low levels of correlation between heterozygosity and other explanatory variables.

Funding Information

National Science Foundation (DEB 81‐070138)

This publication has 29 references indexed in Scilit:

A Practical Genome Scan for Population-Specific Strong Selective Sweeps That Have Reached Fixation
PLOS ONE, 2007
Automated binning of microsatellite alleles: problems and solutions
Molecular Ecology Notes, 2006
Genetic Diversity and Population Structure in Pitch Pine (Pinus rigida Mill.)
Evolution, 1982
Relationships Between Life History Characteristics and Electrophoretically Detectable Genetic Variation in Plants
Annual Review of Ecology and Systematics, 1979
On the distribution of allele frequencies in a diffusion model
Theoretical Population Biology, 1979
Genetic variation in natural populations: Patterns and theory
Theoretical Population Biology, 1978
Variability in the amount of heterozygosity maintained by neutral mutations
Theoretical Population Biology, 1976
Analysis of Gene Diversity in Subdivided Populations
Proceedings of the National Academy of Sciences, 1973
A model of mutation appropriate to estimate the number of electrophoretically detectable alleles in a finite population
Genetics Research, 1973
The sampling theory of selectively neutral alleles
Theoretical Population Biology, 1972