A comparison of position‐specific score matrices based on sequence and structure alignments
- 1 February 2002
- journal article
- research article
- Published by Wiley in Protein Science
- Vol. 11 (2) , 361-370
- https://doi.org/10.1110/ps.19902
Abstract
Sequence comparison methods based on position‐specific score matrices (PSSMs) have proven a useful tool for recognition of the divergent members of a protein family and for annotation of functional sites. Here we investigate one of the factors that affects overall performance of PSSMs in a PSI‐BLAST search, the algorithm used to construct the seed alignment upon which the PSSM is based. We compare PSSMs based on alignments constructed by global sequence similarity (ClustalW and ClustalW‐pairwise), local sequence similarity (BLAST), and local structure similarity (VAST). To assess performance with respect to identification of conserved functional or structural sites, we examine the accuracy of the three‐dimensional molecular models predicted by PSSM‐sequence alignments. Using the known structures of those sequences as the standard of truth, we find that model accuracy varies with the algorithm used for seed alignment construction in the pattern local‐structure (VAST) > local‐sequence (BLAST) > global‐sequence (ClustalW). Using structural similarity of query and database proteins as the standard of truth, we find that PSSM recognition sensitivity depends primarily on the diversity of the sequences included in the alignment, with an optimum around 30–50% average pairwise identity. We discuss these observations, and suggest a strategy for constructing seed alignments that optimize PSSM‐sequence alignment accuracy and recognition sensitivity.Keywords
This publication has 41 references indexed in Scilit:
- SCOP: A structural classification of proteins database for the investigation of sequences and structuresPublished by Elsevier ,2006
- T-coffee: a novel method for fast and accurate multiple sequence alignment 1 1Edited by J. ThorntonJournal of Molecular Biology, 2000
- Enhanced genome annotation using structural profiles in the program 3D-PSSM 1 1Edited by J. ThorntonJournal of Molecular Biology, 2000
- Comparison of sequence profiles. Strategies for structural predictions using sequence informationProtein Science, 2000
- Gleaning non-trivial structural, functional and evolutionary information about proteins by iterative database searchesJournal of Molecular Biology, 1999
- Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methodsJournal of Molecular Biology, 1998
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Threading a database of protein coresProteins-Structure Function and Bioinformatics, 1995
- Structural Features can be Unconserved in Proteins with Similar Folds: An Analysis of Side-chain to Side-chain Contacts Secondary Structure and AccessibilityJournal of Molecular Biology, 1994
- CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choiceNucleic Acids Research, 1994