Boosting for document routing
- 6 November 2000
- conference paper
- Published by Association for Computing Machinery (ACM)
Abstract
RankBoost is a recently proposed algorithm for learning ranking functions. It is simple to implement and has strong justifications from computational learning theory. We describe the algorithm and present experimental results on applying it to the document routing problem. The first set of results applies RankBoost to a text representation produced using modern term weighting methods. Performance of RankBoost is somewhat inferior to that of a state-of-the-art routing algorithm which is, however, more complex and less theoretically justified than RankBoost. RankBoost achieves comparable performance to the state-of-the-art algorithm when combined with feature or example selection heuristics. Our second set of results examines the behavior of RankBoost when it has to learn not only a ranking function but also all aspects of term weighting from raw data. Performance is usually, though not always, less good here, but the term weighting functions implicit in the resulting ranking functions are intriguing, and the approach could easily be adapted to mixtures of textual and nontextual data.Keywords
This publication has 12 references indexed in Scilit:
- Boosting the margin: a new explanation for the effectiveness of voting methodsThe Annals of Statistics, 1998
- A theory of term weighting based on exploratory data analysisPublished by Association for Computing Machinery (ACM) ,1998
- Boosting and Rocchio applied to text filteringPublished by Association for Computing Machinery (ACM) ,1998
- The sixth text Retrieval conference (TREC-6)Published by National Institute of Standards and Technology (NIST) ,1998
- A Decision-Theoretic Generalization of On-Line Learning and an Application to BoostingJournal of Computer and System Sciences, 1997
- Learning routing queries in a query zonePublished by Association for Computing Machinery (ACM) ,1997
- Pivoted document length normalizationPublished by Association for Computing Machinery (ACM) ,1996
- Optimization of relevance feedback weightsPublished by Association for Computing Machinery (ACM) ,1995
- Term-weighting approaches in automatic text retrievalInformation Processing & Management, 1988
- THE PROBABILITY RANKING PRINCIPLE IN IRJournal of Documentation, 1977