Complete inverted files for efficient text retrieval and analysis
- 1 July 1987
- journal article
- research article
- Published by Association for Computing Machinery (ACM) in Journal of the ACM
- Vol. 34 (3) , 578-595
- https://doi.org/10.1145/28869.28873
Abstract
Given a finite set of texts S = {w1, … , wk} over some fixed finite alphabet Σ, a complete inverted file for S is an abstract data type that provides the functions find(w), which returns the longest prefix of w that occurs (as a subword of a word) in S; freq(w), which returns the number of times w occurs in S; and locations(w), which returns the set of positions where w occurs in S. A data structure that implements a complete inverted file for S that occupies linear space and can be built in linear time, using the uniform-cost RAM model, is given. Using this data structure, the time for each of the above query functions is optimal. To accomplish this, techniques from the theory of finite automata and the work on suffix trees are used to build a deterministic finite automaton that recognizes the set of all subwords of the set S. This automaton is then annotated with additional information and compacted to facilitate the desired query functions. The result is a data structure that is smaller and more flexible than the suffix tree.Keywords
This publication has 9 references indexed in Scilit:
- Sequence landscapesNucleic Acids Research, 1986
- The smallest automation recognizing the subwords of a textTheoretical Computer Science, 1985
- A method for detecting structure in polygonsPattern Recognition, 1981
- Efficient On-Line Construction and Correction of Position TreesSIAM Journal on Computing, 1980
- Content-Addressable MemoriesPublished by Springer Nature ,1980
- A Space-Economical Suffix Tree Construction AlgorithmJournal of the ACM, 1976
- Analysis and performance of inverted data base structuresCommunications of the ACM, 1975
- PATRICIA—Practical Algorithm To Retrieve Information Coded in AlphanumericJournal of the ACM, 1968
- Linear automaton transformationsProceedings of the American Mathematical Society, 1958