Using asymmetric distributions to improve text classifier probability estimates
- 28 July 2003
- conference paper
- Published by Association for Computing Machinery (ACM)
- p. 111-118
- https://doi.org/10.1145/860435.860457
Abstract
Text classifiers that give probability estimates are more readily applicable in a variety of scenarios. For example, rather than choosing one set decision threshold, they can be used in a Bayesian risk model to issue a run-time decision which minimizes a user-specified cost function dynamically chosen at prediction time. However, the quality of the probability estimates is crucial. We review a variety of standard approaches to converting scores (and poor probability estimates) from text classifiers to high quality estimates and introduce new models motivated by the intuition that the empirical score distribution for the "extremely irrelevant", "hard to discriminate", and "obviously relevant" items are often significantly different. Finally, we analyze the experimental performance of these models over the outputs of two text classifiers. The analysis demonstrates that one of these models is theoretically attractive (introducing few new parameters while increasing flexibility), computationally efficient, and empirically preferable.Keywords
This publication has 13 references indexed in Scilit:
- Active Sampling for Class Probability Estimation and RankingMachine Learning, 2004
- Modeling score distributions for combining the outputs of search enginesPublished by Association for Computing Machinery (ACM) ,2001
- Hierarchical classification of Web contentPublished by Association for Computing Machinery (ACM) ,2000
- Inductive learning algorithms and representations for text categorizationPublished by Association for Computing Machinery (ACM) ,1998
- Training algorithms for linear text classifiersPublished by Association for Computing Machinery (ACM) ,1996
- A sequential algorithm for training text classifiersACM SIGIR Forum, 1995
- The Comparison and Evaluation of ForecastersJournal of the Royal Statistical Society: Series D (The Statistician), 1983
- On the Reconciliation of Probability AssessmentsJournal of the Royal Statistical Society. Series A (General), 1979
- Scoring Rules and the Evaluation of Probability AssessorsJournal of the American Statistical Association, 1969
- VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITYMonthly Weather Review, 1950