A Critique and Improvement of an Evaluation Metric for Text Segmentation

Top Cited Papers

1 March 2002

journal article
Published by MIT Press in Computational Linguistics

Vol. 28 (1) , 19-36
https://doi.org/10.1162/089120102317341756

Abstract

The P_k evaluation metric, initially proposed by Beeferman, Berger, and Lafferty (1997), is becoming the standard measure for assessing text segmentation algorithms. However, a theoretical analysis of the metric finds several problems: the metric penalizes false negatives more heavily than false positives, overpenalizes near misses, and is affected by variation in segment size distribution. We propose a simple modification to the P_k metric that remedies these problems. This new metric—called Window Diff—moves a fixed-sized window across the text and penalizes the algorithm whenever the number of boundaries within the window does not match the true number of boundaries for that window of text.

Keywords

This publication has 3 references indexed in Scilit:

Statistical Models for Text Segmentation
Machine Learning, 1999
Automatic Analysis, Theme Generation, and Summarization of Machine-Readable Texts
Science, 1994
A Note on the Generation of Random Normal Deviates
The Annals of Mathematical Statistics, 1958