Improving molecular cancer class discovery through sparse non-negative matrix factorization
Open Access
- 8 September 2005
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 21 (21) , 3970-3975
- https://doi.org/10.1093/bioinformatics/bti653
Abstract
Motivation: Identifying different cancer classes or subclasses with similar morphological appearances presents a challenging problem and has important implication in cancer diagnosis and treatment. Clustering based on gene-expression data has been shown to be a powerful method in cancer class discovery. Non-negative matrix factorization is one such method and was shown to be advantageous over other clustering techniques, such as hierarchical clustering or self-organizing maps. In this paper, we investigate the benefit of explicitly enforcing sparseness in the factorization process. Results: We report an improved unsupervised method for cancer classification by the use of gene-expression profile via sparse non-negative matrix factorization. We demonstrate the improvement by direct comparison with classic non-negative matrix factorization on the three well-studied datasets. In addition, we illustrate how to identify a small subset of co-expressed genes that may be directly involved in cancer. Contact:g1m1c1@receptor.med.harvard.edu, ygao@receptor.med.harvard.edu Supplementary information:http://arep.med.harvard.edu/snmf/supplement.htmKeywords
This publication has 17 references indexed in Scilit:
- Metagenes and molecular pattern discovery using matrix factorizationProceedings of the National Academy of Sciences, 2004
- PCA disjoint models for multiclass cancer analysis using gene expression dataBioinformatics, 2003
- Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray DataMachine Learning, 2003
- Molecular classification of cutaneous malignant melanoma by gene expression profilingNature, 2000
- Tissue Classification with Gene Expression ProfilesJournal of Computational Biology, 2000
- Distinct types of diffuse large B-cell lymphoma identified by gene expression profilingNature, 2000
- Learning the parts of objects by non-negative matrix factorizationNature, 1999
- Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression MonitoringScience, 1999
- Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arraysProceedings of the National Academy of Sciences, 1999
- Cluster analysis and display of genome-wide expression patternsProceedings of the National Academy of Sciences, 1998