Abstract
In 1990, Frank Wright introduced a method for measuring synonymous codon usage bias in a gene by estimation of the “effective number of codons,” Nc. Several attempts have been made recently to improve Wright's estimate of Nc, but the methods that work in cases where a gene encodes a protein not containing all amino acids with degenerate codons have not been tested against each other. In this article I derive five new estimators of Nc and test them together with the two published estimators, using resampling under rigorous testing conditions. Estimation of codon homozygosity, F, turns out to be a key to the estimation of Nc. F can be estimated in two closely related ways, corresponding to sampling with or without replacement, the latter being what Wright used. The Nc methods that are based on sampling without replacement showed much better accuracy at short gene lengths than those based on sampling with replacement, indicating that Wright's homozygosity method is superior. Surprisingly, the methods based on sampling with replacement displayed a superior correlation with mRNA levels in Escherichia coli.