Abstract
A randomization procedure is proposed to determine if sets of data used for phylogenetic analysis contain phylogenetically nonrandom information. The method compares the observed number of steps on a minimum length tree with the mean number of steps on minimum length trees derived from the same data set after character state assignments have been randomly permuted within each character. Such randomized data sets will exhibit exactly the same character state distributions as the original data but no phylogenetic information. In tests of 28 separate data sets using this procedure, the minimum lengths of each data set differed significantly from that expected for phylogenetically non-informative data in spite of the fact that observed consistency indices from the original data were as low as 0.230. The high correlation between number of steps per character on minimum length trees and number of taxa among the 28 original data sets is consistent with that expected if a more or less constant frequency of homoplasy occurs per character per taxon. This correlation implies that the consistency index may be an inappropriate, comparative measure of homoplasy among data sets. The observed pattern of increasing homoplasy with increasing numbers of taxa for the original data sets is curvilinear (when forced to pass through a fixed point for all data sets). This is qualitatively different from that expected for random data. Possible uses of the randomization techniques are suggested in cladistic studies using either compatibility analysis or parsimony.