Inferring Pathway Activity toward Precise Disease Classification

Top Cited Papers
Open Access
Abstract
The advent of microarray technology has made it possible to classify disease states based on gene expression profiles of patients. Typically, marker genes are selected by measuring the power of their expression profiles to discriminate among patients of different disease states. However, expression-based classification can be challenging in complex diseases due to factors such as cellular heterogeneity within a tissue sample and genetic heterogeneity across patients. A promising technique for coping with these challenges is to incorporate pathway information into the disease classification procedure in order to classify disease based on the activity of entire signaling pathways or protein complexes rather than on the expression levels of individual genes or proteins. We propose a new classification method based on pathway activities inferred for each patient. For each pathway, an activity level is summarized from the gene expression levels of its condition-responsive genes (CORGs), defined as the subset of genes in the pathway whose combined expression delivers optimal discriminative power for the disease phenotype. We show that classifiers using pathway activity achieve better performance than classifiers based on individual gene expression, for both simple and complex case-control studies including differentiation of perturbed from non-perturbed cells and subtyping of several different kinds of cancer. Moreover, the new method outperforms several previous approaches that use a static (i.e., non-conditional) definition of pathways. Within a pathway, the identified CORGs may facilitate the development of better diagnostic markers and the discovery of core alterations in human disease. The advent of microarray technology has drawn immense interest to identify gene expression levels that can serve as biomarkers for disease. Marker genes are selected by examining each individual gene to see how well its expression level discriminates different disease types. In complex diseases such as cancer, good marker genes can be hard to find due to cellular heterogeneity within the tissue and genetic heterogeneity across patients. A promising technique for addressing these challenges is to incorporate biological pathway information into the marker identification procedure, permitting disease classification based on the activity of entire pathways rather than simply on the expression levels of individual genes. However, previous pathway-based methods have not significantly outperformed gene-based methods. Here, we propose a new pathway-based classification procedure in which markers are encoded not as individual genes, nor as the set of genes making up a known pathway, but as subsets of “condition-responsive genes (CORGs)” within those pathways. Using expression profiles from seven different microarray studies, we show that the accuracy of this method is significantly better than both the conventional gene- and pathway- based diagnostics. Furthermore, the identified CORGs may facilitate the development of effective diagnostic markers and the discovery of molecular mechanisms underlying disease.