MORPH: Probabilistic Alignment Combined with Hidden Markov Models of cis-Regulatory Modules
Open Access
- 9 November 2007
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLoS Computational Biology
- Vol. 3 (11) , e216
- https://doi.org/10.1371/journal.pcbi.0030216
Abstract
The discovery and analysis of cis-regulatory modules (CRMs) in metazoan genomes is crucial for understanding the transcriptional control of development and many other biological processes. Cross-species sequence comparison holds much promise for improving computational prediction of CRMs, for elucidating their binding site composition, and for understanding how they evolve. Current methods for analyzing orthologous CRMs from multiple species rely upon sequence alignments produced by off-the-shelf alignment algorithms, which do not exploit the presence of binding sites in the sequences. We present here a unified probabilistic framework, called MORPH, that integrates the alignment task with binding site predictions, allowing more robust CRM analysis in two species. The framework sums over all possible alignments of two sequences, thus accounting for alignment ambiguities in a natural way. We perform extensive tests on orthologous CRMs from two moderately diverged species Drosophila melanogaster and D. mojavensis, to demonstrate the advantages of the new approach. We show that it can overcome certain computational artifacts of traditional alignment tools and provide a different, likely more accurate, picture of cis-regulatory evolution than that obtained from existing methods. The burgeoning field of cis-regulatory evolution, which is amply supported by the availability of many related genomes, is currently thwarted by the lack of accurate alignments of regulatory regions. Our work will fill in this void and enable more reliable analysis of CRM evolution. Interspecies comparison of regulatory sequences is a major focus in the bioinformatics community today. There is extensive ongoing effort toward measuring the extent and patterns of binding site turnover in cis-regulatory modules. A major roadblock in such an analysis has been the fact that traditional alignment methods are not very accurate for regulatory sequences. This is partly because the alignment is performed independently from the binding site predictions and turnover analysis. This article describes a new computational method to compare and align two orthologous regulatory sequences. It uses a unified probabilistic framework to perform alignment and binding site prediction simultaneously, rather than one after the other. Predictions of binding sites and their evolutionary relationships are obtained after summing over all possible alignments, making them robust to alignment ambiguities. The method can also be used to predict new cis-regulatory modules. The article presents extensive applications of the method on synthetic as well as real data. These include the analysis of over 200 cis-regulatory modules in D. melanogaster and their orthologs in D. mojavensis. This analysis reveals a significantly greater degree of conservation of binding sites between these two species than will be inferred from existing alignment tools.Keywords
This publication has 40 references indexed in Scilit:
- PAML 4: Phylogenetic Analysis by Maximum LikelihoodMolecular Biology and Evolution, 2007
- Large-Scale Turnover of Functional Transcription Factor Binding Sites in DrosophilaPLoS Computational Biology, 2006
- REDfly: a Regulatory Element Database for DrosophilaBioinformatics, 2005
- Drosophila DNase I footprint database: a systematic genome annotation of transcription factor binding sites in the fruitfly, Drosophila melanogasterBioinformatics, 2004
- Transcriptional Control in the Segmentation Gene Network of DrosophilaPLoS Biology, 2004
- Distinguishing Regulatory DNA From Neutral SitesGenome Research, 2003
- Computation-Based Discovery of Related Transcriptional Regulatory Modules and Motifs Using an Experimentally Validated Combinatorial ModelGenome Research, 2002
- Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genomeProceedings of the National Academy of Sciences, 2002
- Basic local alignment search toolJournal of Molecular Biology, 1990
- Evolutionary trees from DNA sequences: A maximum likelihood approachJournal of Molecular Evolution, 1981