The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter
- 1 November 1996
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Information Theory
- Vol. 42 (6) , 2102-2117
- https://doi.org/10.1109/18.556600
Abstract
We observe a training set Q composed of l labeled samples {(X/sub 1/,/spl theta//sub 1/),...,(X/sub l/, /spl theta//sub l/)} and u unlabeled samples {X/sub 1/',...,X/sub u/'}. The labels /spl theta//sub i/ are independent random variables satisfying Pr{/spl theta//sub i/=1}=/spl eta/, Pr{/spl theta//sub i/=2}=1-/spl eta/. The labeled observations X/sub i/ are independently distributed with conditional density f/sub /spl theta/i/(/spl middot/) given /spl theta//sub i/. Let (X/sub 0/,/spl theta//sub 0/) be a new sample, independently distributed as the samples in the training set. We observe X/sub 0/ and we wish to infer the classification /spl theta//sub 0/. In this paper we first assume that the distributions f/sub 1/(/spl middot/) and f/sub 2/(/spl middot/) are given and that the mixing parameter is unknown. We show that the relative value of labeled and unlabeled samples in reducing the risk of optimal classifiers is the ratio of the Fisher informations they carry about the parameter /spl eta/. We then assume that two densities g/sub 1/(/spl middot/) and g/sub 2/(/spl middot/) are given, but we do not know whether g/sub 1/(/spl middot/)=f/sub 1/(/spl middot/) and g/sub 2/(/spl middot/)=f/sub 2/(/spl middot/) or if the opposite holds, nor do we know /spl eta/. Thus the learning problem consists of both estimating the optimum partition of the observation space and assigning the classifications to the decision regions. Here, we show that labeled samples are necessary to construct a classification rule and that they are exponentially more valuable than unlabeled samples.Keywords
This publication has 17 references indexed in Scilit:
- Texture analysis via unsupervised and supervised learningPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Nonlinear mapping with minimal supervised learningPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- On the exponential value of labeled samplesPattern Recognition Letters, 1995
- Laplace’s method in Bayesian analysisContemporary Mathematics, 1991
- Enhancing supervised learning algorithms via self-organizationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1989
- Automatic pattern recognition: a study of the probability of errorPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1988
- Theory of Point EstimationPublished by Springer Nature ,1983
- 39 Dimensionality and sample size considerations in pattern recognition practicePublished by Elsevier ,1982
- Updating a discriminant function on the basis of unclassified dataCommunications in Statistics - Simulation and Computation, 1982
- Estimating the Linear Discriminant Function from Initial Samples Containing a Small Number of Unclassified ObservationsJournal of the American Statistical Association, 1977