Finding important sites in protein sequences

Abstract
By using sequence information from an aligned protein family, a procedure is exhibited for finding sites that may be functionally or structurally critical to the protein. Features based on sequence conservation within subfamilies in the alignment and associations between sites are used to select the sites. The sites are subject to statistical evaluation correcting for phylogenetic bias in the collection of sequences. This method is applied to two families: the phycobiliproteins, light-harvesting proteins in cyanobacteria, red algae, and cryptomonads, and the globins that function in oxygen storage and transport. The sites identified by the procedure are located in key structural positions and merit further experimental study.

This publication has 36 references indexed in Scilit: