The relationship between greedy parsing and symbolwise text compression
- 1 July 1994
- journal article
- Published by Association for Computing Machinery (ACM) in Journal of the ACM
- Vol. 41 (4) , 708-724
- https://doi.org/10.1145/179812.179892
Abstract
Text compression methods can be divided into two classes: symbolwise and parsing . Symbolwise methods assign codes to individual symbols, while parsing methods assign codes to groups of consecutive symbols (phrases). The set of phrases available to a parsing method is referred to as a dictionary . The vast majority of parsing methods in the literature use greedy parsing (including nearly all variations of the popular Ziv-Lempel methods). When greedy parsing is used, the coder processes a string from left to right, at each step encoding as many symbols as possible with a phrase from the dictionary. This parsing strategy is not optimal, but an optimal method cannot guarantee a bounded coding delay. An important problem in compression research has been to establish the relationship between symbolwise methods and parsing methods. This paper extends prior work that shows that there are symbolwise methods that simulate a subset of greedy parsing methods. We provide a more general algorithm that takes any nonadaptive greedy parsing method and constructs a symbolwise method that achieves exactly the same compression. Combined with the existence of symbolwise equivalents for two of the most significant adaptive parsing methods, this result gives added weight to the idea that research aimed at increasing compression should concentrate on symbolwise methods, while parsing methods should be chosen for speed or temporary storage considerations.Keywords
This publication has 20 references indexed in Scilit:
- A locally adaptive data compression schemeCommunications of the ACM, 1986
- Compression of character strings by an adaptive dictionaryBIT Numerical Mathematics, 1985
- Parallel algorithms for data compressionJournal of the ACM, 1985
- A practitioner's guide to data base compression tutorialInformation Systems, 1983
- Data compression via textual substitutionJournal of the ACM, 1982
- Compression of individual sequences via variable-rate codingIEEE Transactions on Information Theory, 1978
- Recoding of natural language for economy of transmission or storageThe Computer Journal, 1978
- A comparison of algorithms for data base compression by use of fragments as language elementsInformation Storage and Retrieval, 1974
- Selection of equifrequent word fragments for information retrievalInformation Storage and Retrieval, 1973
- Common phrases and minimum-space text storageCommunications of the ACM, 1973