MER41 Repeat Sequences Contain Inducible STAT1 Binding Sites

Open Access

6 July 2010

journal article
research article
Published by Public Library of Science (PLoS) in PLOS ONE

Vol. 5 (7) , e11425
https://doi.org/10.1371/journal.pone.0011425

Abstract

Chromatin immunoprecipitation combined with massively parallel sequencing methods (ChIP-seq) is becoming the standard approach to study interactions of transcription factors (TF) with genomic sequences. At the example of public STAT1 ChIP-seq data sets, we present novel approaches for the interpretation of ChIP-seq data. We compare recently developed approaches to determine STAT1 binding sites from ChIP-seq data. Assessing the content of the established consensus sequence for STAT1 binding sites, we find that the usage of “negative control” ChIP-seq data fails to provide substantial advantages. We derive a single refined probabilistic model of STAT1 binding sequences from these ChIP-seq data. Contrary to previous claims, we find no evidence that STAT1 binds to multiple distinct motifs upon interferon-gamma stimulation in vivo. While a large majority of genomic sites with high ChIP-seq signal is associated with a nucleotide sequence ressembling a STAT1 binding site, only a very small subset of the over 5 million potential STAT1 binding sites in the human genome is covered by ChIP-seq data. Furthermore a surprisingly large fraction of the ChIP-seq signal (5%) is absorbed by a small family of repetitive sequences (MER41). The observation of the binding of activated STAT1 protein to a specific repetitive element bolsters similar reports concerning p53 and other TFs, and strengthens the notion of an involvement of repeats in gene regulation. Incidentally MER41 are specific to primates, consequently, regulatory mechanisms in the IFN-STAT pathway might fundamentally differ between primates and rodents. On a methodological aspect, the presence of large numbers of nearly identical binding sites in repetitive sequences may lead to wrong conclusions about intrinsic binding preferences of TF as illustrated by the spacing analysis STAT1 tandem motifs. Therefore, ChIP-seq data should be analyzed independently within repetitive and non-repetitive sequences.

Keywords

This publication has 49 references indexed in Scilit:

PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls
Nature Biotechnology, 2009
The DNA-encoded nucleosome organization of a eukaryotic genome
Nature, 2008
Design and analysis of ChIP-seq experiments for DNA-binding proteins
Nature Biotechnology, 2008
An integrated software system for analyzing ChIP-chip and ChIP-seq data
Nature Biotechnology, 2008
Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data
Nature Methods, 2008
FindPeaks 3.1: a tool for identifying areas of enrichment from massively parallel short-read sequencing technology
Bioinformatics, 2008
Predicting functional transcription factor binding through alignment-free and affinity-based analysis of orthologous promoter sequences
Bioinformatics, 2008
Species-specific endogenous retroviruses shape the transcriptional network of the human tumor suppressor protein p53
Proceedings of the National Academy of Sciences, 2007
Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing
Nature Methods, 2007
Assessing computational tools for the discovery of transcription factor binding sites
Nature Biotechnology, 2005