Fast Comparison of a DNA Sequence with a Protein Sequence Database
- 1 January 1996
- journal article
- research article
- Published by Mary Ann Liebert Inc in Genome Science and Technology
- Vol. 1 (4) , 281-291
- https://doi.org/10.1089/mcg.1996.1.281
Abstract
We describe a computer program, named DNA-Protein Search (DPS), for comparing a megabase DNA sequence with a protein sequence database. The DPS program addresses the problems of frameshifts and introns in the DNA sequence. The DPS program was used to compare each of the following sequences with the Swiss-Prot database: the 1.8-megabase sequence of the Haemophilus influenzae Rd genome, the 0.58-megabase sequence of the Mycoplasma genitalium genome, and the 0.56-megabase sequence of Saccharomyces cerevisiae chromosome VIII. The comparisons found new regions that are similar to protein sequences. The sensitivity of DPS was evaluated using as test data the known coding regions of the three DNA sequences. The results demonstrate that the DPS program is a useful tool for finding the coding regions of the DNA sequence. The DPS program uses an order of magnitude less computer memory and is several times faster than the BLASTX program.Keywords
This publication has 10 references indexed in Scilit:
- The Minimal Gene Complement of Mycoplasma genitaliumScience, 1995
- Whole-Genome Random Sequencing and Assembly of Haemophilus influenzae RdScience, 1995
- Linear-space algorithms that build local alignments from fragmentsAlgorithmica, 1995
- Complete Nucleotide Sequence of Saccharomyces cerevisiae Chromosome VIIIScience, 1994
- Identification of protein coding regions by database similarity searchNature Genetics, 1993
- Amino acid substitution matrices from protein blocks.Proceedings of the National Academy of Sciences, 1992
- A time-efficient, linear-space local similarity algorithmAdvances in Applied Mathematics, 1991
- Basic local alignment search toolJournal of Molecular Biology, 1990
- Improved tools for biological sequence comparison.Proceedings of the National Academy of Sciences, 1988
- Rapid similarity searches of nucleic acid and protein data banks.Proceedings of the National Academy of Sciences, 1983