Tagger Evaluation Given Hierarchical Tag Sets

Preprint

10 August 2000

preprint
Published by arXiv in arXiv

https://doi.org/10.48550/arXiv.cs/0008007

Abstract

We present methods for evaluating human and automatic taggers that extend current practice in three ways. First, we show how to evaluate taggers that assign multiple tags to each test instance, even if they do not assign probabilities. Second, we show how to accommodate a common property of manually constructed ``gold standards'' that are typically used for objective evaluation, namely that there is often more than one correct answer. Third, we show how to measure performance when the set of possible tags is tree-structured in an IS-A hierarchy. To illustrate how our methods can be used to measure inter-annotator agreement, we show how to compute the kappa coefficient over hierarchical tag sets.

Keywords

HIERARCHICAL TAG SETS
TAGGER EVALUATION
AUTOMATIC TAGGERS
EXTEND CURRENT
MEASURE INTER
ACCOMMODATE
GOLD
TAGS

All Related Versions

Version 1, 2000-08-10, ArXiv

This publication has 0 references indexed in Scilit: