A practical stemming algorithm for online search assistance

1 April 1983

journal article
Published by Emerald Publishing in Online Review

Vol. 7 (4) , 301-318
https://doi.org/10.1108/eb024132

Abstract

Word truncation is a familiar technique employed by online searchers in order to increase recall in free text retrieval. The use of truncation, however, can be a mixed blessing since many words starting with the same root are not semantically or logically related. Consequently, online searchers often select words to be OR‐ed together from an alphabetic display of neighbouring terms in the inverted file in order to assure precision in the search. Automatic stemming algorithms typically function in a manner analogous to word truncation, with the added risk of the word roots being incorrectly identified by the algorithm. This paper describes a two‐phase stemming algorithm that consists of the identification of the word root and the automatic selection of ‘well‐formed’ morphological word variants from the actual inverted file entries that start with the same word root. The algorithm has been successfully used in an end‐user interface to NLM's Catline book catalog file.

Keywords

This publication has 2 references indexed in Scilit:

An evaluation of some conflation algorithms for information retrieval
Journal of Information Science, 1981
Word segmentation by letter successor varieties
Information Storage and Retrieval, 1974