Metagene projection for cross-platform, cross-species characterization of global transcriptional states
- 3 April 2007
- journal article
- research article
- Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences
- Vol. 104 (14) , 5959-5964
- https://doi.org/10.1073/pnas.0701068104
Abstract
The high dimensionality of global transcription profiles, the expression level of 20,000 genes in a much small number of samples, presents challenges that affect the sensitivity and general applicability of analysis results. In principle, it would be better to describe the data in terms of a small number of metagenes, positive linear combinations of genes, which could reduce noise while still capturing the invariant biological features of the data. Here, we describe how to accomplish such a reduction in dimension by a metagene projection methodology, which can greatly reduce the number of features used to characterize microarray data. We show, in applications to the analysis of leukemia and lung cancer data sets, how this approach can help assess and interpret similarities and differences between independent data sets, enable cross-platform and cross-species analysis, improve clustering and class prediction, and provide a computational means to detect and remove sample contamination.Keywords
This publication has 42 references indexed in Scilit:
- Gene Expression Profiling Reveals Reproducible Human Lung Adenocarcinoma Subtypes in Multiple Independent Patient CohortsJournal of Clinical Oncology, 2006
- Singular value decomposition of genome-scale mRNA lengths distribution reveals asymmetry in RNA gel electrophoresis band broadeningProceedings of the National Academy of Sciences, 2006
- Oncogenic pathway signatures in human cancers as a guide to targeted therapiesNature, 2005
- Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profilesProceedings of the National Academy of Sciences, 2005
- Multi-way clustering of microarray data using probabilistic sparse matrix factorizationBioinformatics, 2005
- Gene expression profile reveals deregulation of genes with relevant functions in the different subclasses of acute myeloid leukemiaLeukemia, 2005
- Distinct sequences on 11q13.5 and 11q23–24 are frequently coamplified with MLL in complexly organized 11q amplicons in AML/MDS patientsGenes, Chromosomes and Cancer, 2004
- Subsystem Identification Through Dimensionality Reduction of Large-Scale Gene Expression DataGenome Research, 2003
- Gene-expression profiles predict survival of patients with lung adenocarcinomaNature Medicine, 2002
- VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITYMonthly Weather Review, 1950