Clustering of the SOM easily reveals distinct gene expression patterns: results of a reanalysis of lymphoma study
Open Access
- 24 November 2002
- journal article
- Published by Springer Nature in BMC Bioinformatics
- Vol. 3 (1) , 36
- https://doi.org/10.1186/1471-2105-3-36
Abstract
A method to evaluate and analyze the massive data generated by series of microarray experiments is of utmost importance to reveal the hidden patterns of gene expression. Because of the complexity and the high dimensionality of microarray gene expression profiles, the dimensional reduction of raw expression data and the feature selections necessary for, for example, classification of disease samples remains a challenge. To solve the problem we propose a two-level analysis. First self-organizing map (SOM) is used. SOM is a vector quantization method that simplifies and reduces the dimensionality of original measurements and visualizes individual tumor sample in a SOM component plane. Next, hierarchical clustering and K-means clustering is used to identify patterns of gene expression useful for classification of samples. We tested the two-level analysis on public data from diffuse large B-cell lymphomas. The analysis easily distinguished major gene expression patterns without the need for supervision: a germinal center-related, a proliferation, an inflammatory and a plasma cell differentiation-related gene expression pattern. The first three patterns matched the patterns described in the original publication using supervised clustering analysis, whereas the fourth one was novel. Our study shows that by using SOM as an intermediate step to analyze genome-wide gene expression data, the gene expression patterns can more easily be revealed. The "expression display" by the SOM component plane summarises the complicated data in a way that allows the clinician to evaluate the classification options rather than giving a fixed diagnosis.Keywords
This publication has 15 references indexed in Scilit:
- Tumor classification by partial least squares using microarray gene expression dataBioinformatics, 2002
- Statistical estimation of cluster boundaries in gene expression profile dataBioinformatics, 2001
- Plasma cell differentiation requires the transcription factor XBP-1Nature, 2001
- Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networksNature Medicine, 2001
- Roles of STAT3 in mediating the cell growth, differentiation and survival signals relayed through the IL-6 family of cytokine receptorsOncogene, 2000
- Clustering of the self-organizing mapIEEE Transactions on Neural Networks, 2000
- Distinct types of diffuse large B-cell lymphoma identified by gene expression profilingNature, 2000
- The Transcriptional Program in the Response of Human Fibroblasts to SerumScience, 1999
- Requirement for the Transcription Factor LSIRF/IRF4 for Mature B and T Lymphocyte FunctionScience, 1997
- Use of a cDNA microarray to analyse gene expression patterns in human cancerNature Genetics, 1996