Abstract
I present an entropy measure for evaluating parser performance. The measure is fine-grained, and permits us to evaluate performance at the level of individual phrases. The parsing problem is characterized as statistically approximating the Penn Treebank annotations. I consider a series of models to "calibrate" the measure by determining what scores can be achieved using the most obvious kinds of information. I also relate the entropy measure to measures of recall/precision and grammar coverage.

This publication has 0 references indexed in Scilit: