Discriminating between homodimeric and monomeric proteins in the crystalline state

Abstract
Scores calculated from intermolecular contacts of proteins in the crystalline state are used to differentiate monomeric and homodimeric proteins, by classification into two categories separated by a cut-off score value. The generalized classification error is estimated by using bootstrap re-sampling on a nonredundant set of 172 water-soluble proteins whose prevalent quaternary state in solution is known to be either monomeric or homodimeric. A statistical potential, based on atom-pair frequencies across interfaces observed with homodimers, is found to yield an error rate of 12.5%. This indicates a small but significant improvement over the measure of solvent accessible surface area buried in the contact interface, which achieves an error rate of 15.4%. A further modification of the latter parameter relating the two most extensive contacts of the crystal results in an even lower error rate of 11.1%.