Determining functional specificity from protein sequences
- 29 March 2005
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 21 (11) , 2629-2635
- https://doi.org/10.1093/bioinformatics/bti396
Abstract
Given a large family of homologous protein sequences, many methods can divide the family into smaller groups that correspond to the different functions carried out by proteins within the family. One important problem, however, has been the absence of a general method for selecting an appropriate level of granularity, or size of the groups. We propose a consistent way of choosing the granularity that is independent of the sequence similarity and sequence clustering method used. We study three large, well-investigated protein families: basic leucine zippers, nuclear receptors and proteins with three consecutive C2H2 zinc fingers. Our method is tested against known functional information, the experimentally determined binding specificities, using a simple scoring method. The significance of the groups is also measured by randomizing the data. Finally, we compare our algorithm against a popular method of grouping proteins, the TRIBE-MCL method. In the end, we determine that dividing the families at the proposed level of granularity creates very significant and useful groups of proteins that correspond to the different DNA-binding motifs. We expect that such groupings will be useful in studying not only DNA binding but also other protein interactions.Keywords
This publication has 24 references indexed in Scilit:
- GenBank: updateNucleic Acids Research, 2004
- UniProt: the Universal Protein knowledgebaseNucleic Acids Research, 2004
- Expanding protein universe and its origin from the biological Big BangProceedings of the National Academy of Sciences, 2002
- Clustering of proximal sequence space for the identification of protein familiesBioinformatics, 2002
- An efficient algorithm for large-scale detection of protein familiesNucleic Acids Research, 2002
- An open graph visualization system and its applications to software engineeringSoftware: Practice and Experience, 2000
- GeneRAGE: a robust algorithm for sequence clustering and domain detectionBioinformatics, 2000
- A flexible motif search technique based on generalized profilesComputers & Chemistry, 1996
- Uses for evolutionary treesPhilosophical Transactions Of The Royal Society B-Biological Sciences, 1995
- Basic Local Alignment Search ToolJournal of Molecular Biology, 1990