Biclustering algorithms for biological data analysis: a survey
Top Cited Papers
- 24 August 2004
- journal article
- research article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE/ACM Transactions on Computational Biology and Bioinformatics
- Vol. 1 (1) , 24-45
- https://doi.org/10.1109/tcbb.2004.2
Abstract
A large number of clustering approaches have been proposed for the analysis of gene expression data obtained from microarray experiments. However, the results from the application of standard clustering methods to genes are limited. This limitation is imposed by the existence of a number of experimental conditions where the activity of genes is uncorrelated. A similar limitation exists when clustering of conditions is performed. For this reason, a number of algorithms that perform simultaneous clustering on the row and column dimensions of the data matrix has been proposed. The goal is to find submatrices, that is, subgroups of genes and subgroups of conditions, where the genes exhibit highly correlated activities for every condition. In this paper, we refer to this class of algorithms as biclustering. Biclustering is also referred in the literature as coclustering and direct clustering, among others names, and has also been used in fields such as information retrieval and data mining. In this comprehensive survey, we analyze a large number of existing approaches to biclustering, and classify them in accordance with the type of biclusters they can find, the patterns of biclusters that are discovered, the methods used to perform the search, the approaches used to evaluate the solution, and the target applications.Keywords
This publication has 42 references indexed in Scilit:
- The maximum edge biclique problem is NP-completePublished by Elsevier ,2003
- Prediction of central nervous system embryonal tumour outcome based on gene expressionNature, 2002
- MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemiaNature Genetics, 2001
- Integrated Genomic and Proteomic Analyses of a Systematically Perturbed Metabolic NetworkScience, 2001
- Coupled two-way clustering analysis of gene microarray dataProceedings of the National Academy of Sciences, 2000
- Functional Discovery via a Compendium of Expression ProfilesCell, 2000
- Algorithms for association rule mining — a general survey and comparisonACM SIGKDD Explorations Newsletter, 2000
- Distinct types of diffuse large B-cell lymphoma identified by gene expression profilingNature, 2000
- The Transcriptional Program in the Response of Human Fibroblasts to SerumScience, 1999
- Direct Clustering of a Data MatrixJournal of the American Statistical Association, 1972