Superparamagnetic clustering of data

Abstract
The physical aspects of a recently introduced method for data clustering are considered in detail. This method is based on an inhomogeneous Potts model; no assumption concerning the underlying distribution of the data is made. A Potts spin is assigned to each data point and short range interactions between neighboring points are introduced. Spin-spin correlations (measured by Monte Carlo computations) serve to partition the data points into clusters. In this paper we examine the effects of varying different details of the method such as the definition of neighbors, the type of interaction, and the number of Potts states q. In addition, we present and solve a granular mean field Potts model relevant to the clustering method. The model consists of strongly coupled groups of spins coupled to noise spins, which are themselves weakly coupled. The phase diagram is computed by solving analytically the model in various limits. Our main result is that in the range of parameters of interest the existence of the superparamagnetic phase is independent of the ordering process of the noise spins. Next we use the known properties of regular and inhomogeneous Potts models in finite dimensions to discuss the performance of the clustering method. In particular, the spatial resolution of the clustering method is argued to be connected to the correlation length of spin fluctuations. The behavior of the method, as more and more data points are sampled, is also investigated.

This publication has 30 references indexed in Scilit: