ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches

Open Access

1 April 2001

journal article
research article
Published by Oxford University Press (OUP) in Nucleic Acids Research

Vol. 29 (7) , 1647-1652
https://doi.org/10.1093/nar/29.7.1647

Abstract

There is a need for faster and more sensitive algorithms for sequence similarity searching in view of the rapidly increasing amounts of genomic sequence data available. Parallel processing capabilities in the form of the single instruction, multiple data (SIMD) technology are now available in common microprocessors and enable a single microprocessor to perform many operations in parallel. The ParAlign algorithm has been specifically designed to take advantage of this technology. The new algorithm initially exploits parallelism to perform a very rapid computation of the exact optimal ungapped alignment score for all diagonals in the alignment matrix. Then, a novel heuristic is employed to compute an approximate score of a gapped alignment by combining the scores of several diagonals. This approximate score is used to select the most interesting database sequences for a subsequent Smith–Waterman alignment, which is also parallelised. The resulting method represents a substantial improvement compared to existing heuristics. The sensitivity and specificity of ParAlign was found to be as good as Smith–Waterman implementations when the same method for computing the statistical significance of the matches was used. In terms of speed, only the significantly less sensitive NCBI BLAST 2 program was found to outperform the new approach. Online searches are available at http://dna.uio.no/search/

Keywords

This publication has 21 references indexed in Scilit:

Identification of common molecular subsequences
Published by Elsevier ,2004
An improved algorithm for matching biological sequences
Published by Elsevier ,2004
Six-fold speed-up of Smith–Waterman sequence database searches using parallel processing on common microprocessors
Bioinformatics, 2000
The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000
Nucleic Acids Research, 2000
Predicting protein structure using only sequence information
Proteins-Structure Function and Bioinformatics, 1999
Amino acid substitution matrices from protein blocks.
Proceedings of the National Academy of Sciences, 1992
Searching protein sequence libraries: Comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms
Genomics, 1991
Basic local alignment search tool
Journal of Molecular Biology, 1990
Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes.
Proceedings of the National Academy of Sciences, 1990
Improved tools for biological sequence comparison.
Proceedings of the National Academy of Sciences, 1988