Data clustering and noise undressing of correlation matrices

Abstract
We discuss a new approach to data clustering. We find that maximum likelihood leads naturally to an Hamiltonian of Potts variables which depends on the correlation matrix and whose low temperature behavior describes the correlation structure of the data. For random, uncorrelated data sets no correlation structure emerges. On the other hand for data sets with a built-in cluster structure, the method is able to detect and recover efficiently that structure. Finally we apply the method to financial time series, where the low temperature behavior reveals a non trivial clustering.Comment: 8 pages, 5 figures, completely rewritten and enlarged version of cond-mat/0003241. Submitted to Phys. Rev.

This publication has 11 references indexed in Scilit: