Protein comparison at the domain architecture level
Open Access
- 3 December 2009
- journal article
- research article
- Published by Springer Nature in BMC Bioinformatics
- Vol. 10 (S15) , 1-9
- https://doi.org/10.1186/1471-2105-10-S15-S5
Abstract
The general method used to determine the function of newly discovered proteins is to transfer annotations from well-characterized homologous proteins. The process of selecting homologous proteins can largely be classified into sequence-based and domain-based approaches. Domain-based methods have several advantages for identifying distant homology and homology among proteins with multiple domains, as compared to sequence-based methods. However, these methods are challenged by large families defined by 'promiscuous' (or 'mobile') domains. Here we present a measure, called Weighed Domain Architecture Comparison (WDAC), of domain architecture similarity, which can be used to identify homolog of multidomain proteins. To distinguish these promiscuous domains from conventional protein domains, we assigned a weight score to Pfam domain extracted from RefSeq proteins, based on its abundance and versatility. To measure the similarity of two domain architectures, cosine similarity (a similarity measure used in information retrieval) is used. We combined sequence similarity with domain architecture comparisons to identify proteins belonging to the same domain architecture. Using human and nematode proteomes, we compared WDAC with an unweighted domain architecture method (DAC) to evaluate the effectiveness of domain weight scores. We found that WDAC is better at identifying homology among multidomain proteins. Our analysis indicates that considering domain weight scores in domain architecture comparisons improves protein homology identification. We developed a web-based server to allow users to compare their proteins with protein domain architectures.Keywords
This publication has 26 references indexed in Scilit:
- The Rough Guide to In Silico Function Prediction, or How To Use Sequence and Structure Information To Predict Protein FunctionPLoS Computational Biology, 2008
- CleanEST: a database of cleansed EST librariesNucleic Acids Research, 2008
- Sequence Similarity Network Reveals Common Ancestry of Multidomain ProteinsPLoS Computational Biology, 2008
- PfamAlyzer: domain-centric homology searchBioinformatics, 2007
- ESTpass: a web-based server for processing and annotating expressed sequence tag (EST) sequencesNucleic Acids Research, 2007
- Domain Architecture Comparison for Multidomain Homology IdentificationJournal of Computational Biology, 2007
- An initial strategy for comparing proteins at the domain architecture levelBioinformatics, 2006
- Modules, multidomain proteins and organismic complexityThe FEBS Journal, 2005
- Evolution of the Protein RepertoireScience, 2003
- The Natural History of Protein DomainsAnnual Review of Biophysics, 2002