Using asymmetric distributions to improve text classifier probability estimates

28 July 2003

conference paper
Published by Association for Computing Machinery (ACM)

p. 111-118
https://doi.org/10.1145/860435.860457

Abstract

Text classifiers that give probability estimates are more readily applicable in a variety of scenarios. For example, rather than choosing one set decision threshold, they can be used in a Bayesian risk model to issue a run-time decision which minimizes a user-specified cost function dynamically chosen at prediction time. However, the quality of the probability estimates is crucial. We review a variety of standard approaches to converting scores (and poor probability estimates) from text classifiers to high quality estimates and introduce new models motivated by the intuition that the empirical score distribution for the "extremely irrelevant", "hard to discriminate", and "obviously relevant" items are often significantly different. Finally, we analyze the experimental performance of these models over the outputs of two text classifiers. The analysis demonstrates that one of these models is theoretically attractive (introducing few new parameters while increasing flexibility), computationally efficient, and empirically preferable.

Keywords

This publication has 13 references indexed in Scilit:

Active Sampling for Class Probability Estimation and Ranking
Machine Learning, 2004
Modeling score distributions for combining the outputs of search engines
Published by Association for Computing Machinery (ACM) ,2001
Hierarchical classification of Web content
Published by Association for Computing Machinery (ACM) ,2000
Inductive learning algorithms and representations for text categorization
Published by Association for Computing Machinery (ACM) ,1998
Training algorithms for linear text classifiers
Published by Association for Computing Machinery (ACM) ,1996
A sequential algorithm for training text classifiers
ACM SIGIR Forum, 1995
The Comparison and Evaluation of Forecasters
Journal of the Royal Statistical Society: Series D (The Statistician), 1983
On the Reconciliation of Probability Assessments
Journal of the Royal Statistical Society. Series A (General), 1979
Scoring Rules and the Evaluation of Probability Assessors
Journal of the American Statistical Association, 1969
VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY
Monthly Weather Review, 1950