Piers
- 1 June 2004
- journal article
- Published by Association for Computing Machinery (ACM) in ACM SIGMOD Record
- Vol. 33 (2) , 39-44
- https://doi.org/10.1145/1024694.1024701
Abstract
Growing interest in genomic research has resulted in the creation of huge biological sequence databases. In this paper, we present a hash-based pier model for efficient homology search in large DNA sequence databases. In our model, only certain segments in the databases called 'piers' need to be accessed during searches as opposite to other approaches which require a full scan on the biological sequence database. To further improve the search efficiency, the piers are stored in a specially designed hash table which helps to avoid expensive alignment operation. The has table is small enough to reside in main memory, hence avoiding I/O in the search steps. We show theoretically and empirically that the proposed approach can efficiently detect biological sequences that are similar to a query sequence with very high sensitivity.Keywords
This publication has 11 references indexed in Scilit:
- Identification of common molecular subsequencesPublished by Elsevier ,2004
- The ed-tree: an index for large DNA sequence databasesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2004
- Indexing and retrieval for genomic databasesIEEE Transactions on Knowledge and Data Engineering, 2002
- PatternHunter: faster and more sensitive homology searchBioinformatics, 2002
- Approximate nearest neighbors and sequence comparison with block operationsPublished by Association for Computing Machinery (ACM) ,2000
- q -gram based database searching using a suffix array (QUASAR)Published by Association for Computing Machinery (ACM) ,1999
- Fast subsequence matching in time-series databasesPublished by Association for Computing Machinery (ACM) ,1994
- Suffix Arrays: A New Method for On-Line String SearchesSIAM Journal on Computing, 1993
- Basic local alignment search toolJournal of Molecular Biology, 1990
- Improved tools for biological sequence comparison.Proceedings of the National Academy of Sciences, 1988