Abstract
Simple but exact statistical tests for detecting a cluster of associated nucleotide changes in DNA are presented. The tests are based on the linear distribution of a set of s sites among a total of n sites, where the s sites may be the variable sites, sites of insertion/deletion, or categorized in some other way. These tests are especially useful for detecting gene conversion and intragenic recombination in a sample of DNA sequences. In this case, the sites of interest are those that correspond to particular ways of splitting the sequences into two groups (e.g., sequences A and D vs. sequences B, C, and E-J). Each such split is termed a phylogenetic partition. Application of these methods to a well-documented case of gene conversion in human gamma-globin genes shows that sites corresponding to two of the three observed partitions are significantly clustered, whereas application to hominoid mitochondrial DNA sequences--among which no recombination is expected to occur--shows no evidence of such clustering. This indicates that clustering of partition-specific sites is largely due to intragenic recombination or gene conversion. Alternative hypotheses explaining the observed clustering of sites, such as biased selection or mutation, are discussed.