Inference of Splicing Regulatory Activities by Sequence Neighborhood Analysis
Open Access
- 1 January 2006
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLoS Genetics
- Vol. 2 (11) , e191
- https://doi.org/10.1371/journal.pgen.0020191
Abstract
Sequence-specific recognition of nucleic-acid motifs is critical to many cellular processes. We have developed a new and general method called Neighborhood Inference (NI) that predicts sequences with activity in regulating a biochemical process based on the local density of known sites in sequence space. Applied to the problem of RNA splicing regulation, NI was used to predict hundreds of new exonic splicing enhancer (ESE) and silencer (ESS) hexanucleotides from known human ESEs and ESSs. These predictions were supported by cross-validation analysis, by analysis of published splicing regulatory activity data, by sequence-conservation analysis, and by measurement of the splicing regulatory activity of 24 novel predicted ESEs, ESSs, and neutral sequences using an in vivo splicing reporter assay. These results demonstrate the ability of NI to accurately predict splicing regulatory activity and show that the scope of exonic splicing regulatory elements is substantially larger than previously anticipated. Analysis of orthologous exons in four mammals showed that the NI score of ESEs, a measure of function, is much more highly conserved above background than ESE primary sequence. This observation indicates a high degree of selection for ESE activity in mammalian exons, with surprisingly frequent interchangeability between ESE sequences. Gene expression involves a series of steps in which specific short DNA or RNA segments are recognized by nucleic acid–binding proteins. One step that is particularly prominent and complex in humans and other vertebrates is the removal of introns and the ligation of exons in the process of pre-mRNA splicing. To better understand the sequences in exons that regulate this process, the authors have developed a method termed Neighborhood Inference that predicts the splicing regulatory activity of RNA segments based on the known splicing enhancer or silencer activity of other segments that have closely neighboring sequences. This method is applied to predict hundreds of new exonic splicing regulatory elements, as well as splicing-neutral sequences. A number of these predictions were validated experimentally, indicating that the number of exonic splicing regulatory sequences is larger than previously suspected. Neighborhood Inference scoring is also used to show that selection on exonic splicing enhancers (ESEs) frequently allows conversion of one ESE sequence to another over evolutionary time periods, suggesting that ESEs are, to at least some degree, interchangeable in constitutively spliced exons. The methods described may also find application in the study of other biomolecular processes that involve sequence-specific nucleic acid–binding proteins.Keywords
This publication has 43 references indexed in Scilit:
- General and Specific Functions of Exonic Splicing Silencers in Splicing ControlMolecular Cell, 2006
- Evidence for Purifying Selection Against Synonymous Mutations in Mammalian Exonic Splicing EnhancersMolecular Biology and Evolution, 2005
- Understanding alternative splicing: towards a cellular codeNature Reviews Molecular Cell Biology, 2005
- Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammalsNature, 2005
- Single Nucleotide Polymorphism–Based Validation of Exonic Splicing EnhancersPLoS Biology, 2004
- Computational definition of sequence motifs governing constitutive exon splicingGenes & Development, 2004
- Maximum Entropy Modeling of Short Sequence Motifs with Applications to RNA Splicing SignalsJournal of Computational Biology, 2004
- Program-Specific Distribution of a Transcription Factor Dependent on Partner Transcription Factor and MAPK SignalingCell, 2003
- The UCSC Genome Browser DatabaseNucleic Acids Research, 2003
- High-throughput SELEX–SAGE method for quantitative modeling of transcription-factor binding sitesNature Biotechnology, 2002