On probability distributions of single-linkage dendrograms

Abstract
There are ways to order the pairwise similarities between N objects, assuming no ties. According to single linkage (SL) clustering, each such order determines a dendrogram for the N objects. We give an algorithm for calculating the number of different SL-dendrograms on N objects. We also give an algorithm for calculating the probability distribution of the SL-dendrograms under pure randomness, i.e. assuming that all the similarity orders are equally probable. The results are used to illustrate the statistical risks for small values of N,. when SL-dendrograms are used to test cluster structure hypotheses.

This publication has 5 references indexed in Scilit: