Classification with finite memory

Abstract
Consider the following situation. A device called a classifier observes a probability law P on l-vectors from an alphabet of size A. Its task is to observe a second probability law Q and decide whether P≡Q or P and Q are sufficiently different according to some appropriate criterion. If the classifier has available an unlimited memory (so that it can remember P(z) exactly for all z), this is a simple matter. In fact for most differentness criteria, a finite memory of 2(log A)l+o(l) bits will suffice (for large l), i.e., store a finite approximation of P(z) for all Alz's. In a sense made precise in this paper, it is shown that a memory of only about 2Rl bits is required, where the quantity R<log A, and is closely related to the entropy of P. Further, it is shown that if instead of being given P(z), for all z, the classifier is given a training sequence drawn with a probability law P that can be stored using about 2Rl bits, then correct classification is also possible

This publication has 2 references indexed in Scilit: