Indexing compressed text
Top Cited Papers
- 1 July 2005
- journal article
- Published by Association for Computing Machinery (ACM) in Journal of the ACM
- Vol. 52 (4) , 552-581
- https://doi.org/10.1145/1082036.1082039
Abstract
We design two compressed data structures for the full-text indexing problem that support efficient substring searches using roughly the space required for storing the text in compressed form.Our first compressed data structure retrieves the occ occurrences of a pattern P[1,p] within a text T[1,n] in O(p + occ log1+ε n) time for any chosen ε, 0nHk(T) + o(n) bits of storage, where Hk(T) is the kth order empirical entropy of T. The space usage is Θ(n) bits in the worst case and o(n) bits for compressible texts. This data structure exploits the relationship between suffix arrays and the Burrows--Wheeler Transform, and can be regarded as a compressed suffix array.Our second compressed data structure achieves O(p+occ) query time using O(nHk(T)logε n) + o(n) bits of storage for any chosen ε, 0output-sensitive query time using o(nlog n) bits in the worst case. This second data structure builds upon the first one and exploits the interplay between two compressors: the Burrows--Wheeler Transform and the LZ78 algorithm.Keywords
This publication has 21 references indexed in Scilit:
- Boosting textual compression in optimal linear timeJournal of the ACM, 2005
- New text indexing functionalities of the compressed suffix arraysJournal of Algorithms, 2003
- Annotating Large Genomes With Exact Word MatchesGenome Research, 2003
- An experimental study of a compressed indexInformation Sciences, 2001
- Space Efficient Suffix TreesJournal of Algorithms, 2001
- A time and space efficient data structure for string searching on large textsInformation Processing Letters, 1996
- The Burrows-Wheeler Transform for Block Sorting Text Compression: Principles and ImprovementsThe Computer Journal, 1996
- A locally adaptive data compression schemeCommunications of the ACM, 1986
- Compression of individual sequences via variable-rate codingIEEE Transactions on Information Theory, 1978
- A Space-Economical Suffix Tree Construction AlgorithmJournal of the ACM, 1976