Performance evaluation of amino acid substitution matrices
- 1 September 1993
- journal article
- research article
- Published by Wiley in Proteins-Structure Function and Bioinformatics
- Vol. 17 (1) , 49-61
- https://doi.org/10.1002/prot.340170108
Abstract
Several choices of amino acid substitution matrices are currently available for searching and alignment applications. These choices were evaluated using the BLAST searching program, which is extremely sensitive to differences among matrices, and the Prosite catalog, which lists members of hundreds of protein families. Matrices derived directly from either sequence‐based or structurebased alignments of distantly related proteins performed much better overall than extrapolated matrices based on the Dayhoff evolutionary model. Similar results were obtained with the FASTA searching program. Improved performance appears to be general rather than family‐specific, reflecting improved accuracy in scoring alignments. An implementation of a multiple matrix strategy was also tested. While no combination of three matrices performed as well as the single best matrix, BLOSUM 62, good results were obtained using a combination of sequence‐based and structure‐based matrices. This hybrid set of matrices is likely to be useful in certain situations. Our results illustrate the importance of matrix selection and value of a comprehensive approach to evaluation of protein comparison tools.Keywords
This publication has 34 references indexed in Scilit:
- Amino acid substitution matrices from an information theoretic perspectivePublished by Elsevier ,2005
- Identification of common molecular subsequencesPublished by Elsevier ,2004
- CLUSTAL: a package for performing multiple sequence alignment on a microcomputerPublished by Elsevier ,2003
- Statistics of local complexity in amino acid sequences and sequence databasesPublished by Elsevier ,2001
- Recognition of distantly related protein sequences using conserved motifs and neural networksJournal of Molecular Biology, 1992
- The rapid generation of mutation data matrices from protein sequencesBioinformatics, 1992
- Searching protein sequence libraries: Comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithmsGenomics, 1991
- A new family of powerful multivariate statistical sequence analysis techniquesJournal of Molecular Biology, 1991
- Basic local alignment search toolJournal of Molecular Biology, 1990
- Identification of protein sequence homology by consensus template alignmentJournal of Molecular Biology, 1986