When indexing equals compression
- 1 October 2006
- journal article
- research article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Algorithms
- Vol. 2 (4) , 611-639
- https://doi.org/10.1145/1198513.1198521
Abstract
We report on a new experimental analysis of high-order entropy-compressed suffix arrays, which retains the theoretical performance of previous work and represents an improvement in practice. Our experiments indicate that the resulting text index offers state-of-the-art compression. In particular, we require roughly 20% of the original text size---without requiring a separate instance of the text. We can additionally use a simple notion to encode and decode block-sorting transforms (such as the Burrows--Wheeler transform), achieving a compression ratio comparable to that of bzip2. We also provide a compressed representation of suffix trees (and their associated text) in a total space that is comparable to that of the text alone compressed with gzip.Keywords
This publication has 24 references indexed in Scilit:
- Indexing compressed textJournal of the ACM, 2005
- Boosting textual compression in optimal linear timeJournal of the ACM, 2005
- The Level Ancestor Problem simplifiedTheoretical Computer Science, 2004
- New text indexing functionalities of the compressed suffix arraysJournal of Algorithms, 2003
- Burrows–Wheeler compression with variable length integer codesSoftware: Practice and Experience, 2002
- Space Efficient Suffix TreesJournal of Algorithms, 2001
- Arithmetic coding revisitedACM Transactions on Information Systems, 1998
- Fractional cascading: I. A data structuring techniqueAlgorithmica, 1986
- A locally adaptive data compression schemeCommunications of the ACM, 1986
- A Space-Economical Suffix Tree Construction AlgorithmJournal of the ACM, 1976