Hierarchical file organization and its application to similar-string matching
- 1 September 1983
- journal article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Database Systems
- Vol. 8 (3) , 410-433
- https://doi.org/10.1145/319989.319994
Abstract
The automatic correction of misspelled inputs is discussed from a viewpoint of similar-string matching. First a hierarchical file organization based on a linear ordering of records is presented for retrieving records highly similar to any input query. Then the spelling problem is attacked by constructing a hierarchical file for a set of strings in a dictionary of English words. The spelling correction steps proceed as follows: (1) find one of the best-match strings which are most similar to a query, (2) expand the search area for obtaining the good-match strings, and (3) interrupt the file search as soon as the required string is displayed. Computational experiments verify the performance of the proposed methods for similar-string matching under the UNIX™ time-sharing system.Keywords
This publication has 24 references indexed in Scilit:
- Electronic information interchange in an office environmentIBM Systems Journal, 1981
- The keystroke-level model for user performance time with interactive systemsCommunications of the ACM, 1980
- Office Information Systems and Computer ScienceACM Computing Surveys, 1980
- Data Structures for Range SearchingACM Computing Surveys, 1979
- Ubiquitous B-TreeACM Computing Surveys, 1979
- A Family of Similarity Measures Between Two StringsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1979
- An Algorithm for Finding Best Matches in Logarithmic Expected TimeACM Transactions on Mathematical Software, 1977
- Multidimensional binary search trees used for associative searchingCommunications of the ACM, 1975
- A Graph Theoretic Analysis of Pattern Classification via Tamura's Fuzzy RelationIEEE Transactions on Systems, Man, and Cybernetics, 1974
- Some approaches to best-match file searchingCommunications of the ACM, 1973