Sensitive detection of sequence similarity using combinatorial pattern discovery: A challenging study of two distantly related protein families
- 13 October 2005
- journal article
- research article
- Published by Wiley in Proteins-Structure Function and Bioinformatics
- Vol. 61 (4) , 926-937
- https://doi.org/10.1002/prot.20608
Abstract
We investigate the performance of combinatorial pattern discovery to detect remote sequence similarities in terms of both biological accuracy and computational efficiency for a pair of distantly related families, as a case study. The two families represent the cupredoxins and multicopper oxidases, both containing blue copper-binding domains. These families present a challenging case due to low sequence similarity, different local structure, and variable sequence conservation at their copper-binding active sites. In this study, we investigate a new approach for automatically identifying weak sequence similarities that is based on combinatorial pattern discovery. We compare its performance with a traditional, HMM-based scheme and obtain estimates for sensitivity and specificity of the two approaches. Our analysis suggests that pattern discovery methods can be substantially more sensitive in detecting remote protein relationships while at the same time guaranteeing high specificity. Proteins 2005.Keywords
This publication has 64 references indexed in Scilit:
- The Pfam protein families databaseNucleic Acids Research, 2004
- The Phylogenetic Extent of Metabolic Enzymes and PathwaysGenome Research, 2003
- The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003Nucleic Acids Research, 2003
- Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methodsJournal of Molecular Biology, 1998
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choiceNucleic Acids Research, 1994
- Hidden Markov Models in Computational BiologyJournal of Molecular Biology, 1994
- Basic local alignment search toolJournal of Molecular Biology, 1990
- The classification of amino acid conservationJournal of Theoretical Biology, 1986
- Molecules as documents of evolutionary historyJournal of Theoretical Biology, 1965