Identifying Functionally Important Mutations from Phenotypically Diverse Sequence Data

Abstract
Here we present a simple statistical method to determine the phenotypic contribution of a single mutation from libraries of mutants with diverse phenotypes in which each mutant contains a multitude of mutations. The central premise of this method is that, given M phenotypic classes, mutations that do not affect the phenotype should partition among the M classes according to a multinomial distribution. Deviations from this distribution are indicative of a link between specific mutations and phenotypes. We suggest that this method will aid the engineering of functional nucleic acids, proteins, and other biomolecules by uncovering target sites for rational mutagenesis. As a proof of the principle, we show how the method can be used to deduce the individual effects of mutations in a set of 69 P L -λ promoter variants. Each of these promoters was generated by error-prone PCR and incorporated numerous mutations. The activity of the promoters was assayed using flow cytometry to measure the fluorescence of a green fluorescent protein reporter gene. Our analysis of the sequences of these mutants revealed seven positions having a statistically significant correlation with promoter activity. Using site-directed mutagenesis, we constructed point mutations for several sites, both statistically significant and insignificant, and combinations of these sites. Our results show that the statistical method correctly elucidated the phenotypic manifestations of these mutations. We suggest that this method may be useful for expediting directed evolution experiments by allowing both desired and undesired mutations to be identified and incorporated between rounds of mutagenesis.