Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships
- 26 May 1998
- journal article
- research article
- Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences
- Vol. 95 (11) , 6073-6078
- https://doi.org/10.1073/pnas.95.11.6073
Abstract
Pairwise sequence comparison methods have been assessed using proteins whose relationships are known reliably from their structures and functions, as described in the scop database [Murzin, A. G., Brenner, S. E., Hubbard, T. & Chothia C. (1995) J. Mol. Biol. 247, 536–540]. The evaluation tested the programs blast [Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. (1990). J. Mol. Biol. 215, 403–410], wu-blast2 [Altschul, S. F. & Gish, W. (1996) Methods Enzymol. 266, 460–480], fasta [Pearson, W. R. & Lipman, D. J. (1988) Proc. Natl. Acad. Sci. USA 85, 2444–2448], and ssearch [Smith, T. F. & Waterman, M. S. (1981) J. Mol. Biol. 147, 195–197] and their scoring schemes. The error rate of all algorithms is greatly reduced by using statistical scores to evaluate matches rather than percentage identity or raw scores. The E-value statistical scores of ssearch and fasta are reliable: the number of false positives found in our tests agrees well with the scores reported. However, the P-values reported by blast and wu-blast2 exaggerate significance by orders of magnitude. ssearch, fasta ktup = 1, and wu-blast2 perform best, and they are capable of detecting almost all relationships between proteins whose sequence identities are >30%. For more distantly related proteins, they do much less well; only one-half of the relationships between proteins with 20–30% identity are found. Because many homologs have low sequence similarity, most distant relationships cannot be detected by any pairwise comparison method; however, those which are identified may be used with confidence.Keywords
This publication has 39 references indexed in Scilit:
- SCOP: A structural classification of proteins database for the investigation of sequences and structuresPublished by Elsevier ,2006
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- CATH – a hierarchic classification of protein domain structuresPublished by Elsevier ,1997
- An Assessment of Amino Acid Exchange Matrices in Aligning Protein Sequences: The Twilight Zone RevisitedJournal of Molecular Biology, 1995
- A Structural Basis for Sequence ComparisonsJournal of Molecular Biology, 1993
- Basic local alignment search toolJournal of Molecular Biology, 1990
- Evaluation and improvements in the automatic alignment of protein sequencesProtein Engineering, Design and Selection, 1987
- Molecular packing and intermolecular contacts of sickling deer type III hemoglobinJournal of Molecular Biology, 1979
- An improved method of testing for evolutionary homologyJournal of Molecular Biology, 1966
- Structure and function of haemoglobinJournal of Molecular Biology, 1965