The effectiveness of position- and composition-specific gap costs for protein similarity searches
Open Access
- 1 July 2008
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 24 (13) , i15-i23
- https://doi.org/10.1093/bioinformatics/btn171
Abstract
Motivation: The flexibility in gap cost enjoyed by hidden Markov models (HMMs) is expected to afford them better retrieval accuracy than position-specific scoring matrices (PSSMs). We attempt to quantify the effect of more general gap parameters by separately examining the influence of position- and composition-specific gap scores, as well as by comparing the retrieval accuracy of the PSSMs constructed using an iterative procedure to that of the HMMs provided by Pfam and SUPERFAMILY, curated ensembles of multiple alignments. Results: We found that position-specific gap penalties have an advantage over uniform gap costs. We did not explore optimizing distinct uniform gap costs for each query. For Pfam, PSSMs iteratively constructed from seeds based on HMM consensus sequences perform equivalently to HMMs that were adjusted to have constant gap transition probabilities, albeit with much greater variance. We observed no effect of composition-specific gap costs on retrieval performance. These results suggest possible improvements to the PSI-BLAST protein database search program. Availability: The scripts for performing evaluations are available upon request from the authors. Contact: yyu@ncbi.nlm.nih.govKeywords
All Related Versions
This publication has 39 references indexed in Scilit:
- Data growth and its impact on the SCOP database: new developmentsNucleic Acids Research, 2007
- Database resources of the National Center for Biotechnology InformationNucleic Acids Research, 2006
- The SUPERFAMILY database in 2007: families and functionsNucleic Acids Research, 2006
- Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searchesNucleic Acids Research, 2006
- Pfam: clans, web tools and servicesNucleic Acids Research, 2006
- Empirical Analysis of Protein Insertions and Deletions Determining Parameters for the Correct Placement of Gaps in Protein Sequence AlignmentsJournal of Molecular Biology, 2004
- Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structureJournal of Molecular Biology, 2001
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Hidden Markov Models in Computational BiologyJournal of Molecular Biology, 1994
- Analysis of insertions/deletions in protein structuresJournal of Molecular Biology, 1992