Efficient large-scale sequence comparison by locality-sensitive hashing

1 May 2001

journal article
research article
Published by Oxford University Press (OUP) in Bioinformatics

Vol. 17 (5) , 419-428
https://doi.org/10.1093/bioinformatics/17.5.419

Abstract

Motivation: Comparison of multimegabase genomic DNA sequences is a popular technique for finding and annotating conserved genome features. Performing such comparisons entails finding many short local alignments between sequences up to tens of megabases in length. To process such long sequences efficiently, existing algorithms find alignments by expanding around short runs of matching bases with no substitutions or other differences. Unfortunately, exact matches that are short enough to occur often in significant alignments also occur frequently by chance in the background sequence. Thus, these algorithms must trade off between efficiency and sensitivity to features without long exact matches.

Keywords

This publication has 0 references indexed in Scilit: