Off-line compression by greedy textual substitution
- 1 November 2000
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in Proceedings of the IEEE
- Vol. 88 (11) , 1733-1744
- https://doi.org/10.1109/5.892709
Abstract
Greedy off-line textual substitution refers to the following approach to compression or structural inference. Given a long text string x, a substring W is identified such that replacing all instances of W in X except one by a suitable pair of pointers yields the highest possible contraction of X; the process is then repeated on the contracted text string until substrings capable of producing contractions can no longer be found. This paper examines computational issues arising in the implementation of this paradigm and describes some applications and experiments.Keywords
This publication has 33 references indexed in Scilit:
- An efficient network management method using combined sampling period decision algorithm and Dynamic ID allocation in CAN-based control systemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2007
- Extended application of suffix trees to data compressionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- A guaranteed compression scheme for repetitive DNA sequencesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Reducing the space requirement of suffix treesSoftware: Practice and Experience, 1999
- How Many Squares Can a String Contain?Journal of Combinatorial Theory, Series A, 1998
- Detection of significant patterns by compression algorithms: the case of approximate tandem repeats in DNA sequencesBioinformatics, 1997
- Data structures and algorithms for the string statistics problemAlgorithmica, 1996
- On-line construction of suffix treesAlgorithmica, 1995
- Discovery by Minimal Length Encoding: A case study in molecular evolutionMachine Learning, 1993
- Self-alignments in words and their applicationsJournal of Algorithms, 1992