Decoding Human Regulatory Circuits

Abstract
Clusters of transcription factor binding sites (TFBSs) which direct gene expression constitutecis-regulatory modules (CRMs). We present a novel algorithm, based on Gibbs sampling, which locates, de novo, thecisfeatures of these CRMs, their component TFBSs, and the properties of their spatial distribution. The algorithm finds 69% of experimentally reported TFBSs and 85% of the CRMs in a reference data set of regions upstream of genes differentially expressed in skeletal muscle cells. A discriminant procedure based on the output of the model specifically discriminated regulatory sequences in muscle-specific genes in an independent test set. Application of the method to the analysis of 2710 10-kb fragments upstream of annotated human genes identified 17 novel candidate modules with a false discovery rate ≤0.05, demonstrating the applicability of the method to genome-scale data.