A new approach to unsupervised text summarization
- 1 September 2001
- proceedings article
- Published by Association for Computing Machinery (ACM)
Abstract
The paper presents a novel approach to unsupervised text summarization. The novelty lies in exploiting the diversity of concepts in text for summarization, which has not received much attention in the summarization literature. A diversity-based approach here is a principled generalization of Maximal Marginal Relevance criterion by Carbonell and Goldstein \cite{carbonell-goldstein98}.We propose, in addition, aninformation-centricapproach to evaluation, where the quality of summaries is judged not in terms of how well they match human-created summaries but in terms of how well they represent their source documents in IR tasks such document retrieval and text categorization.To find the effectiveness of our approach under the proposed evaluation scheme, we set out to examine how a system with the diversity functionality performs against one without, using the BMIR-J2 corpus, a test data developed by a Japanese research consortium. The results demonstrate a clear superiority of a diversity based approach to a non-diversity based approach.Keywords
This publication has 11 references indexed in Scilit:
- Centroid-based summarization of multiple documentsPublished by Association for Computational Linguistics (ACL) ,2000
- Query-relevant summarization using FAQsPublished by Association for Computational Linguistics (ACL) ,2000
- The automatic construction of large-scale corpora for summarization researchPublished by Association for Computing Machinery (ACM) ,1999
- The use of MMR, diversity-based reranking for reordering documents and producing summariesPublished by Association for Computing Machinery (ACM) ,1998
- Stochastic Complexity in LearningJournal of Computer and System Sciences, 1997
- The rhetorical parsing of natural language textsPublished by Association for Computational Linguistics (ACL) ,1997
- Distribution of content words and phrases in text and language modellingNatural Language Engineering, 1996
- Fast generation of abstracts from general domain text corpora by extracting relevant sentencesPublished by Association for Computational Linguistics (ACL) ,1996
- A trainable document summarizerPublished by Association for Computing Machinery (ACM) ,1995
- New Methods in Automatic ExtractingJournal of the ACM, 1969