On the evaluation of document analysis components by recall, precision, and accuracy

1 January 1999

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 713-716
https://doi.org/10.1109/icdar.1999.791887

Abstract

In document analysis, it is common to prove the usefulness of a component by an experimental evaluation. By applying the respective algorithms to a test sample, effectiveness measures such as recall, precision, and accuracy are computed. The goal of such an evaluation is two-fold: on the one hand it shows that the absolute effectiveness of the algorithm is acceptable for practical use. On the other hand the evaluation can prove that the algorithm has a better or worse effectiveness than another algorithm. We argue that the experimental evaluation on relative small test sets-as is very common in document analysis has to be taken with extreme care from a statistical point of view. In fact, it is surprising how weak statements derived from such evaluations are.

Keywords

This publication has 5 references indexed in Scilit:

An experimental evaluation of OCR text representations for learning document classifiers
International Journal on Document Analysis and Recognition (IJDAR), 1998
Handbook Of Character Recognition and Document Image Analysis
Published by World Scientific Pub Co Pte Ltd ,1997
The Logic of Inductive Inference
Journal of the Royal Statistical Society, 1935
THE USE OF CONFIDENCE OR FIDUCIAL LIMITS ILLUSTRATED IN THE CASE OF THE BINOMIAL
Biometrika, 1934
Inverse Probability
Mathematical Proceedings of the Cambridge Philosophical Society, 1930