Abstract
Clustering the output of a multi‐database online search enables a user to obtain an overview of the information that has been retrieved without the need to inspect any documents that contain only redundant information. In this paper we describe a classification scheme that characterises the degree of relationship between pairs of documents in database search‐outputs and then report the application of a range of clustering methods and similarity coefficients to 20 such outputs. These experiments demonstrate that clustering is capable of grouping documents that are identical to, or closely‐related to, other documents in the search‐output on the basis of their term similarities.

This publication has 7 references indexed in Scilit: