Hierarchical file organization and its application to similar-string matching

1 September 1983

journal article
Published by Association for Computing Machinery (ACM) in ACM Transactions on Database Systems

Vol. 8 (3) , 410-433
https://doi.org/10.1145/319989.319994

Abstract

The automatic correction of misspelled inputs is discussed from a viewpoint of similar-string matching. First a hierarchical file organization based on a linear ordering of records is presented for retrieving records highly similar to any input query. Then the spelling problem is attacked by constructing a hierarchical file for a set of strings in a dictionary of English words. The spelling correction steps proceed as follows: (1) find one of the best-match strings which are most similar to a query, (2) expand the search area for obtaining the good-match strings, and (3) interrupt the file search as soon as the required string is displayed. Computational experiments verify the performance of the proposed methods for similar-string matching under the UNIX™ time-sharing system.

Keywords

This publication has 24 references indexed in Scilit:

Electronic information interchange in an office environment
IBM Systems Journal, 1981
The keystroke-level model for user performance time with interactive systems
Communications of the ACM, 1980
Office Information Systems and Computer Science
ACM Computing Surveys, 1980
Data Structures for Range Searching
ACM Computing Surveys, 1979
Ubiquitous B-Tree
ACM Computing Surveys, 1979
A Family of Similarity Measures Between Two Strings
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1979
An Algorithm for Finding Best Matches in Logarithmic Expected Time
ACM Transactions on Mathematical Software, 1977
Multidimensional binary search trees used for associative searching
Communications of the ACM, 1975
A Graph Theoretic Analysis of Pattern Classification via Tamura's Fuzzy Relation
IEEE Transactions on Systems, Man, and Cybernetics, 1974
Some approaches to best-match file searching
Communications of the ACM, 1973