The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter

1 November 1996

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Information Theory

Vol. 42 (6) , 2102-2117
https://doi.org/10.1109/18.556600

Abstract

We observe a training set Q composed of l labeled samples {(X/sub 1/,/spl theta//sub 1/),...,(X/sub l/, /spl theta//sub l/)} and u unlabeled samples {X/sub 1/',...,X/sub u/'}. The labels /spl theta//sub i/ are independent random variables satisfying Pr{/spl theta//sub i/=1}=/spl eta/, Pr{/spl theta//sub i/=2}=1-/spl eta/. The labeled observations X/sub i/ are independently distributed with conditional density f/sub /spl theta/i/(/spl middot/) given /spl theta//sub i/. Let (X/sub 0/,/spl theta//sub 0/) be a new sample, independently distributed as the samples in the training set. We observe X/sub 0/ and we wish to infer the classification /spl theta//sub 0/. In this paper we first assume that the distributions f/sub 1/(/spl middot/) and f/sub 2/(/spl middot/) are given and that the mixing parameter is unknown. We show that the relative value of labeled and unlabeled samples in reducing the risk of optimal classifiers is the ratio of the Fisher informations they carry about the parameter /spl eta/. We then assume that two densities g/sub 1/(/spl middot/) and g/sub 2/(/spl middot/) are given, but we do not know whether g/sub 1/(/spl middot/)=f/sub 1/(/spl middot/) and g/sub 2/(/spl middot/)=f/sub 2/(/spl middot/) or if the opposite holds, nor do we know /spl eta/. Thus the learning problem consists of both estimating the optimum partition of the observation space and assigning the classifications to the decision regions. Here, we show that labeled samples are necessary to construct a classification rule and that they are exponentially more valuable than unlabeled samples.

Keywords

This publication has 17 references indexed in Scilit:

Texture analysis via unsupervised and supervised learning
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Nonlinear mapping with minimal supervised learning
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
On the exponential value of labeled samples
Pattern Recognition Letters, 1995
Laplace’s method in Bayesian analysis
Contemporary Mathematics, 1991
Enhancing supervised learning algorithms via self-organization
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1989
Automatic pattern recognition: a study of the probability of error
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1988
Theory of Point Estimation
Published by Springer Nature ,1983
39 Dimensionality and sample size considerations in pattern recognition practice
Published by Elsevier ,1982
Updating a discriminant function on the basis of unclassified data
Communications in Statistics - Simulation and Computation, 1982
Estimating the Linear Discriminant Function from Initial Samples Containing a Small Number of Unclassified Observations
Journal of the American Statistical Association, 1977