Discovering Biological Progression Underlying Microarray Samples
Open Access
- 14 April 2011
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLoS Computational Biology
- Vol. 7 (4) , e1001123
- https://doi.org/10.1371/journal.pcbi.1001123
Abstract
In biological systems that undergo processes such as differentiation, a clear concept of progression exists. We present a novel computational approach, called Sample Progression Discovery (SPD), to discover patterns of biological progression underlying microarray gene expression data. SPD assumes that individual samples of a microarray dataset are related by an unknown biological process (i.e., differentiation, development, cell cycle, disease progression), and that each sample represents one unknown point along the progression of that process. SPD aims to organize the samples in a manner that reveals the underlying progression and to simultaneously identify subsets of genes that are responsible for that progression. We demonstrate the performance of SPD on a variety of microarray datasets that were generated by sampling a biological process at different points along its progression, without providing SPD any information of the underlying process. When applied to a cell cycle time series microarray dataset, SPD was not provided any prior knowledge of samples' time order or of which genes are cell-cycle regulated, yet SPD recovered the correct time order and identified many genes that have been associated with the cell cycle. When applied to B-cell differentiation data, SPD recovered the correct order of stages of normal B-cell differentiation and the linkage between preB-ALL tumor cells with their cell origin preB. When applied to mouse embryonic stem cell differentiation data, SPD uncovered a landscape of ESC differentiation into various lineages and genes that represent both generic and lineage specific processes. When applied to a prostate cancer microarray dataset, SPD identified gene modules that reflect a progression consistent with disease stages. SPD may be best viewed as a novel tool for synthesizing biological hypotheses because it provides a likely biological progression underlying a microarray dataset and, perhaps more importantly, the candidate genes that regulate that progression. We present a novel computational approach, Sample Progression Discovery (SPD), to discover biological progression underlying a microarray dataset. In contrast to the majority of microarray data analysis methods which identify differences between sample groups (normal vs. cancer, treated vs. control), SPD aims to identify an underlying progression among individual samples, both within and across sample groups. We validated SPD's ability to discover biological progression using datasets of cell cycle, B-cell differentiation, and mouse embryonic stem cell differentiation. We view SPD as a hypothesis generation tool when applied to datasets where the progression is unclear. For example, when applied to a microarray dataset of cancer samples, SPD assumes that the cancer samples collected from individual patients represent different stages during an intrinsic progression underlying cancer development. The inferred relationship among the samples may therefore indicate a trajectory or hierarchy of cancer progression, which serves as a hypothesis to be tested. SPD is not limited to microarray data analysis, and can be applied to a variety of high-dimensional datasets. We implemented SPD using MATLAB graphical user interface, which is available at http://icbp.stanford.edu/software/SPD/.Keywords
This publication has 32 references indexed in Scilit:
- Simultaneous Class Discovery and Classification of Microarray Data Using Spectral AnalysisJournal of Computational Biology, 2009
- Fast calculation of pairwise mutual information for gene regulatory network reconstructionComputer Methods and Programs in Biomedicine, 2009
- Defining Developmental Potency and Cell Lineage Trajectories by Expression Profiling of Differentiating Mouse Embryonic Stem CellsDNA Research, 2008
- Efficient and Accurate Construction of Genetic Linkage Maps from the Minimum Spanning Tree of a GraphPLoS Genetics, 2008
- Precedence Temporal Networks to represent temporal relationships in gene expression dataJournal of Biomedical Informatics, 2007
- Gene expression profiles of prostate cancer reveal involvement of multiple molecular pathways in the metastatic processBMC Cancer, 2007
- ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular ContextBMC Bioinformatics, 2006
- Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profilesProceedings of the National Academy of Sciences, 2005
- Tumor classification using phylogenetic methods on expression dataJournal of Theoretical Biology, 2004
- Significance analysis of microarrays applied to the ionizing radiation responseProceedings of the National Academy of Sciences, 2001