Abstract
We examine the effects of simulated OCR errors on Boolean query models for information retrieval. We show that even relatively small amounts of such noise can have a significant impact. To address this issue, we formulate new variants of the traditional models by combining two classic paradigms for dealing with imprecise data: approximate string matching and fuzzy logic. Using a recall/precision analysis of an experiment involving nearly 60 million query evaluations, we demonstrate that the new fuzzy retrieval methods are generally more robust than their "sharp" counterparts.

This publication has 8 references indexed in Scilit: