Simulating evolution by gene duplication of protein features that require multiple amino acid residues

Abstract
Gene duplication is thought to be a major source of evolutionary innovation because it allows one copy of a gene to mutate and explore genetic space while the other copy continues to fulfill the original function. Models of the process often implicitly assume that a single mutation to the duplicated gene can confer a new selectable property. Yet some protein features, such as disulfide bonds or ligand binding sites, require the participation of two or more amino acid residues, which could require several mutations. Here we model the evolution of such protein features by what we consider to be the conceptually simplest route—point mutation in duplicated genes. We show that for very large population sizes N, where at steady state in the absence of selection the population would be expected to contain one or more duplicated alleles coding for the feature, the time to fixation in the population hovers near the inverse of the point mutation rate, and varies sluggishly with the λth root of 1/N, where λ is the number of nucleotide positions that must be mutated to produce the feature. At smaller population sizes, the time to fixation varies linearly with 1/N and exceeds the inverse of the point mutation rate. We conclude that, in general, to be fixed in 108 generations, the production of novel protein features that require the participation of two or more amino acid residues simply by multiple point mutations in duplicated genes would entail population sizes of no less than 109.