An initial strategy for comparing proteins at the domain architecture level
Open Access
- 12 July 2006
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 22 (17) , 2081-2086
- https://doi.org/10.1093/bioinformatics/btl366
Abstract
Motivation: Ideally, only proteins that exhibit highly similar domain architectures should be compared with one another as homologues or be classified into a single family. By combining three different indices, the Jaccard index, the Goodman-Kruskal γ function and the domain duplicate index, into a single similarity measure, we propose a method for comparing proteins based on their domain architectures. Results: Evaluation of the method using the eukaryotic orthologous groups of proteins (KOGs) database indicated that it allows the automatic and efficient comparison of multiple-domain proteins, which are usually refractory to classic approaches based on sequence similarity measures. As a case study, the PDZ and LRR_1 domains are used to demonstrate how proteins containing promiscuous domains can be clearly compared using our method. For the convenience of users, a web server was set up where three different query interfaces were implemented to compare different domain architectures or proteins with domain(s), and to identify the relationships among domain architectures within a given KOG from the Clusters of Orthologous Groups of Proteins database. Conclusion: The approach we propose is suitable for estimating the similarity of domain architectures of proteins, especially those of multidomain proteins. Availability: Contact:linkui@bnu.edu.cn Supplementary Information: Supplementary data are available at Bioinformatics online.Keywords
This publication has 40 references indexed in Scilit:
- SCOP: A structural classification of proteins database for the investigation of sequences and structuresPublished by Elsevier ,2006
- The Pfam protein families databaseNucleic Acids Research, 2004
- Classification schemes for protein structure and functionNature Reviews Genetics, 2003
- The structure of the protein universe and genome evolutionNature, 2002
- Scale‐free networks in biology: new insights into the fundamentals of evolution?*BioEssays, 2002
- Initial sequencing and analysis of the human genomeNature, 2001
- Estimating the number of protein folds and families from complete genome data 1 1Edited by J. ThorntonJournal of Molecular Biology, 2000
- The relationship between protein structure and function: a comprehensive survey with application to the yeast genomeJournal of Molecular Biology, 1999
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- One thousand families for the molecular biologistNature, 1992