How Many Genes Should a Systematist Sample? Conflicting Insights from a Phylogenomic Matrix Characterized by Replicated Incongruence

Abstract
The average size of molecular systematic data sets has grown steadily over the past 20 years. Combined phylogenetic matrices that include multiple genetic loci currently are the norm, and in many cases, rapid compilation of extremely large DNA data sets is feasible. Thus, a frequently asked question is “How many genes should a systematist sequence in order to generate a robust phylogenetic hypothesis?” This query generally has been addressed by computer simulation, where the amount of virtual DNA sequence data that can be generated is unlimited (e.g., Huelsenbeck and Hillis, 1993). Genomic data, however, provide systematists with a multitude of empirical molecular data for phylogenetic analysis, and several authors have taken advantage of this resource to examine the effects of increasing the number of genes to quantities that seemed impossible in the recent past (e.g., Cummings et al., 1995; Bapteste et al., 2002; Goremykin, 2004).