Mammalian RNA polymerase II core promoters: insights from genome-wide studies

Abstract
Genome-wide methods have identified fivefold to tenfold more transcription start sites (TSSs) than were previously known to exist. Many of these occur at unexpected locations, such as assumed gene deserts, exons and 3′ UTRs of known genes. Most promoters are not represented by the accepted model of a single TSS with an upstream TATA-box; a cluster of TSSs in a narrow region of genomic DNA is the most common pattern. Core promoters can be classified according to the distribution and relative usage of their TSSs. The TSS distribution of core promoters is tightly coupled to the occurrence of both known cis-regulatory elements and gene function, and is generally conserved between humans and mice. Few promoters use an extended initiator sequence to define the TSS. The most consistent pattern is a pyrimidine–purine dinucleotide that overlaps the TSS. Most genes have at least two distinct promoters, which may be differentially regulated and generate mRNAs that encode different protein isoforms. The wealth of TSS data enables new types of analysis, including the study of promoter evolution and functional analysis of promoters on a genome-wide scale.