Improved tools for biological sequence comparison.
- 1 April 1988
- journal article
- research article
- Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences
- Vol. 85 (8) , 2444-2448
- https://doi.org/10.1073/pnas.85.8.2444
Abstract
We have developed three computer programs for comparisons of protein and DNA sequences. They can be used to search sequence data bases, evaluate similarity scores, and identify periodic structures based on local sequence similarity. The FASTA program is a more sensitive derivative of the FASTP program, which can be used to search protein or DNA sequence data bases and can compare a protein sequence to a DNA sequence data base by translating the DNA data base as it is searched. FASTA includes an additional step in the calculation of the initial pairwise similarity score that allows multiple regions of similarity to be joined to increase the score of related sequences. The RDF2 program can be used to evaluate the significance of similarity scores using a shuffling method that preserves local sequence composition. The LFASTA program can display all the regions of local similarity between two sequences with scores greater than a threshold, using the same scoring parameters and a similar alignment algorithm; these local similarities can be displayed as a "graphic matrix" plot or as individual alignments. In addition, these programs have been generalized to allow comparison of DNA or protein sequences based on a variety of alternative scoring matrices.This publication has 10 references indexed in Scilit:
- Identification of common molecular subsequencesPublished by Elsevier ,2004
- Rapid and Sensitive Protein Similarity SearchesScience, 1985
- On the statistical significance of nucleic add similaritiesNucleic Acids Research, 1984
- Rapid similarity searches of nucleic acid and protein data banks.Proceedings of the National Academy of Sciences, 1983
- Pattern recognition in nucleic acid sequences. I. A general method for finding local homologies and symmetriesNucleic Acids Research, 1982
- Efficient algorithms for folding and comparing nucleic acid sequencesNucleic Acids Research, 1982
- Enhanced graphic matrix analysis of nucleic acid and protein sequences.Proceedings of the National Academy of Sciences, 1981
- Similar Amino Acid Sequences: Chance or Common Ancestry?Science, 1981
- Pattern recognition in genetic sequencesProceedings of the National Academy of Sciences, 1979
- A general method applicable to the search for similarities in the amino acid sequence of two proteinsJournal of Molecular Biology, 1970