Species Choice for Comparative Genomics: Being Greedy Works

Abstract
Several projects investigating genetic function and evolution through sequencing and comparison of multiple genomes are now underway. These projects consume many resources, and appropriate planning should be devoted to choosing which species to sequence, potentially involving cooperation among different sequencing centres. A widely discussed criterion for species choice is the maximisation of evolutionary divergence. Our mathematical formalization of this problem surprisingly shows that the best long-term cooperative strategy coincides with the seemingly short-term “greedy” strategy of always choosing the next best single species. Other criteria influencing species choice, such as medical relevance or sequencing costs, can also be accommodated in our approach, suggesting our results' broad relevance in scientific policy decisions. What would happen if sequencing centres around the world were to choose genomes without consulting each other and without devising long-term strategies? When several parties are involved in decisions with interacting consequences, experience teaches that cooperation and planning are usually necessary to guarantee the best result. Similarly, in computer science, “greedy” algorithms—which construct solutions by iteratively taking the best immediate choice—are rarely the best option to solve a problem. The authors show, however, that in the context of choosing species for comparative genomics, cooperation and planning can be kept to a minimum without affecting the quality of the global result: a greedy algorithm applied to the problem of maximising the evolutionary divergence among species chosen from a known phylogeny is proven to guarantee optimal solutions.