Biclustering of gene expression data by non-smooth non-negative matrix factorization
Open Access
- 17 February 2006
- journal article
- research article
- Published by Springer Nature in BMC Bioinformatics
- Vol. 7 (1) , 78
- https://doi.org/10.1186/1471-2105-7-78
Abstract
The extended use of microarray technologies has enabled the generation and accumulation of gene expression datasets that contain expression levels of thousands of genes across tens or hundreds of different experimental conditions. One of the major challenges in the analysis of such datasets is to discover local structures composed by sets of genes that show coherent expression patterns across subsets of experimental conditions. These patterns may provide clues about the main biological processes associated to different physiological states. In this work we present a methodology able to cluster genes and conditions highly related in sub-portions of the data. Our approach is based on a new data mining technique, Non-smooth Non-Negative Matrix Factorization (n sNMF), able to identify localized patterns in large datasets. We assessed the potential of this methodology analyzing several synthetic datasets as well as two large and heterogeneous sets of gene expression profiles. In all cases the method was able to identify localized features related to sets of genes that show consistent expression patterns across subsets of experimental conditions. The uncovered structures showed a clear biological meaning in terms of relationships among functional annotations of genes and the phenotypes or physiological states of the associated conditions. The proposed approach can be a useful tool to analyze large and heterogeneous gene expression datasets. The method is able to identify complex relationships among genes and conditions that are difficult to identify by standard clustering algorithms.Keywords
This publication has 44 references indexed in Scilit:
- Multi-way clustering of microarray data using probabilistic sparse matrix factorizationBioinformatics, 2005
- Biclustering algorithms for biological data analysis: a surveyIEEE/ACM Transactions on Computational Biology and Bioinformatics, 2004
- A gene atlas of the mouse and human protein-encoding transcriptomesProceedings of the National Academy of Sciences, 2004
- Tissue Microarray Validation of Epidermal Growth Factor Receptor and SALL2 in Synovial Sarcoma with Comparison to Tumors of Similar HistologyThe American Journal of Pathology, 2003
- Subsystem Identification Through Dimensionality Reduction of Large-Scale Gene Expression DataGenome Research, 2003
- Molecular characterisation of soft tissue tumours: a gene expression studyThe Lancet, 2002
- Coupled two-way clustering analysis of gene microarray dataProceedings of the National Academy of Sciences, 2000
- Functional Discovery via a Compendium of Expression ProfilesCell, 2000
- Distinct types of diffuse large B-cell lymphoma identified by gene expression profilingNature, 2000
- Cloning, Sequencing, and Mapping of the Human Chromosome 14 Heat Shock Protein Gene (HSPA2)Genomics, 1994