EFFICIENCY OF HIERARCHIC AGGLOMERATIVE CLUSTERING USING THE ICL DISTRIBUTED ARRAY PROCESSOR
- 1 January 1989
- journal article
- Published by Emerald Publishing in Journal of Documentation
- Vol. 45 (1) , 1-24
- https://doi.org/10.1108/eb026836
Abstract
The implementation of hierarchic agglomerative methods of cluster anlaysis for large datasets is very demanding of computational resources when implemented on conventional computers. The ICL Distributed Array Processor (DAP) allows many of the scanning and matching operations required in clustering to be carried out in parallel. Experiments are described using the single linkage and Ward's hierarchical agglomerative clustering methods on both real and simulated datasets. Clustering runs on the DAP are compared with the most efficient algorithms currently available implemented on an IBM 3083 BX. The DAP is found to be 2.9–7.9 times as fast as the IBM, the exact degree of speed‐up depending on the size of the dataset, the clustering method, and the serial clustering algorithm that is used. An analysis of the cycle times of the two machines is presented which suggests that further, very substantial speed‐ups could be obtained from array processors of this type if they were to be based on more powerful processing elements.Keywords
This publication has 28 references indexed in Scilit:
- Automatic classification of chemical structure databases using a highly parallel array processorJournal of Computational Chemistry, 1988
- Parallel text search methodsCommunications of the ACM, 1988
- Parallel free-text search on the connection machine systemCommunications of the ACM, 1986
- An evaluation of document retrieval from serial files using the ICL Distributed Array ProcessorOnline Review, 1984
- HIERARCHIC AGGLOMERATIVE CLUSTERING METHODS FOR AUTOMATIC DOCUMENT CLASSIFICATIONJournal of Documentation, 1984
- The Distributed Array Processor (DAP)Computer Physics Communications, 1983
- Drooling — A non-parametric multidimensional clustering algorithm for distributed array processorComputer Physics Communications, 1982
- Algorithm 422: minimal spanning tree [H]Communications of the ACM, 1972
- A Review of ClassificationJournal of the Royal Statistical Society. Series A (General), 1971
- A note on two problems in connexion with graphsNumerische Mathematik, 1959