Generic text summarization using relevance measure and latent semantic analysis
Top Cited Papers
- 1 September 2001
- proceedings article
- Published by Association for Computing Machinery (ACM)
Abstract
In this paper, we propose two generic text summarization methods that create text summaries by ranking and extracting sentences from the original documents. The first method uses standard IR methods to rank sentence relevances, while the second method uses the latent semantic analysis technique to identify semantically important sentences, for summary creations. Both methods strive to select sentences that are highly ranked and different from each other. This is an attempt to create a summary with a wider coverage of the document's main content and less redundancy. Performance evaluations on the two summarization methods are conducted by comparing their summarization outputs with the manual summaries generated by three independent human evaluators. The evaluations also study the influence of different VSM weighting schemes on the text summarization performances. Finally, the causes of the large disparities in the evaluators' manual summarization results are investigated, and discussions on human text summarization patterns are presented.Keywords
This publication has 3 references indexed in Scilit:
- Summarizing text documentsPublished by Association for Computing Machinery (ACM) ,1999
- Accurate user directed summarization from existing toolsPublished by Association for Computing Machinery (ACM) ,1998
- Indexing by latent semantic analysisJournal of the American Society for Information Science, 1990