Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches
Open Access
- 26 October 2006
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 34 (20) , 5966-5973
- https://doi.org/10.1093/nar/gkl731
Abstract
Protein sequence database search programs may be evaluated both for their retrieval accuracy—the ability to separate meaningful from chance similarities—and for the accuracy of their statistical assessments of reported alignments. However, methods for improving statistical accuracy can degrade retrieval accuracy by discarding compositional evidence of sequence relatedness. This evidence may be preserved by combining essentially independent measures of alignment and compositional similarity into a unified measure of sequence similarity. A version of the BLAST protein database search program, modified to employ this new measure, outperforms the baseline program in both retrieval and statistical accuracy on ASTRAL, a SCOP-based test set.Keywords
This publication has 43 references indexed in Scilit:
- SCOP: A structural classification of proteins database for the investigation of sequences and structuresPublished by Elsevier ,2006
- Protein database searches using compositionally adjusted substitution matricesThe FEBS Journal, 2005
- Calibrating E-values for hidden Markov models using reverse-sequence null modelsBioinformatics, 2005
- Identification of common molecular subsequencesPublished by Elsevier ,2004
- Statistics of local complexity in amino acid sequences and sequence databasesPublished by Elsevier ,2001
- Accurate formula for P-values of gapped local sequence and profile alignmentsJournal of Molecular Biology, 2000
- Iterated profile searches with PSI-BLAST—a tool for discovery in protein databasesTrends in Biochemical Sciences, 1998
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- MultiCoil: A program for predicting two‐and three‐stranded coiled coilsProtein Science, 1997
- Basic local alignment search toolJournal of Molecular Biology, 1990