Learning a Prior on Regulatory Potential from eQTL Data
Open Access
- 30 January 2009
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLoS Genetics
- Vol. 5 (1) , e1000358
- https://doi.org/10.1371/journal.pgen.1000358
Abstract
Genome-wide RNA expression data provide a detailed view of an organism's biological state; hence, a dataset measuring expression variation between genetically diverse individuals (eQTL data) may provide important insights into the genetics of complex traits. However, with data from a relatively small number of individuals, it is difficult to distinguish true causal polymorphisms from the large number of possibilities. The problem is particularly challenging in populations with significant linkage disequilibrium, where traits are often linked to large chromosomal regions containing many genes. Here, we present a novel method, Lirnet, that automatically learns a regulatory potential for each sequence polymorphism, estimating how likely it is to have a significant effect on gene expression. This regulatory potential is defined in terms of “regulatory features”—including the function of the gene and the conservation, type, and position of genetic polymorphisms—that are available for any organism. The extent to which the different features influence the regulatory potential is learned automatically, making Lirnet readily applicable to different datasets, organisms, and feature sets. We apply Lirnet both to the human HapMap eQTL dataset and to a yeast eQTL dataset and provide statistical and biological results demonstrating that Lirnet produces significantly better regulatory programs than other recent approaches. We demonstrate in the yeast data that Lirnet can correctly suggest a specific causal sequence variation within a large, linked chromosomal region. In one example, Lirnet uncovered a novel, experimentally validated connection between Puf3—a sequence-specific RNA binding protein—and P-bodies—cytoplasmic structures that regulate translation and RNA stability—as well as the particular causative polymorphism, a SNP in Mkt1, that induces the variation in the pathway. Gene expression data of genetically diverse individuals (eQTL data) provide a unique perspective on the effect of genetic variation on cellular pathways. However, the burden of multiple hypotheses, combined with the challenges of linkage disequilibrium, makes it difficult to correctly identify causal polymorphisms. Researchers traditionally apply heuristics for selecting among plausible hypotheses, favoring polymorphisms that are more conserved, that lead to significant amino acid change, or that reside in genes whose function is related to that of the targets. But how do we know how much weight to attribute to different regulatory features? We describe Lirnet, which learns from eQTL data how to weight regulatory features and induce a regulatory potential for sequence variations. Lirnet assesses these weights simultaneously to learning a regulatory network, finding weights that lead to a more predictive network. We show that Lirnet constructs high-accuracy regulatory programs and demonstrate its ability to correctly identify causative polymorphisms. Lirnet can flexibly use any regulatory features, including sequence features that are available for any sequenced organism, and automatically learn their weights in a dataset-specific way. This feature makes it especially advantageous for mammalian systems, where many forms of prior knowledge used in simple model organisms are incomplete or unavailable.Keywords
This publication has 76 references indexed in Scilit:
- eQED: an efficient method for interpreting eQTL associations using protein networksMolecular Systems Biology, 2008
- Systematic Discovery of In Vivo Phosphorylation NetworksCell, 2007
- Transcriptional responses to fatty acid are coordinated by combinatorial controlMolecular Systems Biology, 2007
- Targeting of Aberrant mRNAs to Cytoplasmic Processing BodiesCell, 2006
- Exploration of the Function and Organization of the Yeast Early Secretory Pathway through an Epistatic Miniarray ProfileCell, 2005
- Optimized cassettes for fluorescent protein tagging in Saccharomyces cerevisiaeYeast, 2004
- Global analysis of protein localization in budding yeastNature, 2003
- Module networks: identifying regulatory modules and their condition-specific regulators from gene expression dataNature Genetics, 2003
- Functional profiling of the Saccharomyces cerevisiae genomeNature, 2002
- Dissecting the architecture of a quantitative trait locus in yeastNature, 2002