MER41 Repeat Sequences Contain Inducible STAT1 Binding Sites
Open Access
- 6 July 2010
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLOS ONE
- Vol. 5 (7) , e11425
- https://doi.org/10.1371/journal.pone.0011425
Abstract
Chromatin immunoprecipitation combined with massively parallel sequencing methods (ChIP-seq) is becoming the standard approach to study interactions of transcription factors (TF) with genomic sequences. At the example of public STAT1 ChIP-seq data sets, we present novel approaches for the interpretation of ChIP-seq data. We compare recently developed approaches to determine STAT1 binding sites from ChIP-seq data. Assessing the content of the established consensus sequence for STAT1 binding sites, we find that the usage of “negative control” ChIP-seq data fails to provide substantial advantages. We derive a single refined probabilistic model of STAT1 binding sequences from these ChIP-seq data. Contrary to previous claims, we find no evidence that STAT1 binds to multiple distinct motifs upon interferon-gamma stimulation in vivo. While a large majority of genomic sites with high ChIP-seq signal is associated with a nucleotide sequence ressembling a STAT1 binding site, only a very small subset of the over 5 million potential STAT1 binding sites in the human genome is covered by ChIP-seq data. Furthermore a surprisingly large fraction of the ChIP-seq signal (5%) is absorbed by a small family of repetitive sequences (MER41). The observation of the binding of activated STAT1 protein to a specific repetitive element bolsters similar reports concerning p53 and other TFs, and strengthens the notion of an involvement of repeats in gene regulation. Incidentally MER41 are specific to primates, consequently, regulatory mechanisms in the IFN-STAT pathway might fundamentally differ between primates and rodents. On a methodological aspect, the presence of large numbers of nearly identical binding sites in repetitive sequences may lead to wrong conclusions about intrinsic binding preferences of TF as illustrated by the spacing analysis STAT1 tandem motifs. Therefore, ChIP-seq data should be analyzed independently within repetitive and non-repetitive sequences.Keywords
This publication has 49 references indexed in Scilit:
- PeakSeq enables systematic scoring of ChIP-seq experiments relative to controlsNature Biotechnology, 2009
- The DNA-encoded nucleosome organization of a eukaryotic genomeNature, 2008
- Design and analysis of ChIP-seq experiments for DNA-binding proteinsNature Biotechnology, 2008
- An integrated software system for analyzing ChIP-chip and ChIP-seq dataNature Biotechnology, 2008
- Genome-wide analysis of transcription factor binding sites based on ChIP-Seq dataNature Methods, 2008
- FindPeaks 3.1: a tool for identifying areas of enrichment from massively parallel short-read sequencing technologyBioinformatics, 2008
- Predicting functional transcription factor binding through alignment-free and affinity-based analysis of orthologous promoter sequencesBioinformatics, 2008
- Species-specific endogenous retroviruses shape the transcriptional network of the human tumor suppressor protein p53Proceedings of the National Academy of Sciences, 2007
- Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencingNature Methods, 2007
- Assessing computational tools for the discovery of transcription factor binding sitesNature Biotechnology, 2005