PhyME: A probabilistic algorithm for finding motifs in sets of orthologous sequences
Open Access
- 28 October 2004
- journal article
- research article
- Published by Springer Nature in BMC Bioinformatics
- Vol. 5 (1) , 170
- https://doi.org/10.1186/1471-2105-5-170
Abstract
Background: This paper addresses the problem of discovering transcription factor binding sites in heterogeneous sequence data, which includes regulatory sequences of one or more genes, as well as their orthologs in other species. Results: We propose an algorithm that integrates two important aspects of a motif's significance – overrepresentation and cross-species conservation – into one probabilistic score. The algorithm allows the input orthologous sequences to be related by any user-specified phylogenetic tree. It is based on the Expectation-Maximization technique, and scales well with the number of species and the length of input sequences. We evaluate the algorithm on synthetic data, and also present results for data sets from yeast, fly, and human. Conclusions: The results demonstrate that the new approach improves motif discovery by exploiting multiple species information.Keywords
This publication has 31 references indexed in Scilit:
- Eukaryotic Regulatory Element Conservation Analysis and Identification Using Comparative GenomicsGenome Research, 2004
- Sequencing and comparison of yeast species to identify genes and regulatory elementsNature, 2003
- LAGAN and Multi-LAGAN: Efficient Tools for Large-Scale Multiple Alignment of Genomic DNAGenome Research, 2003
- Computation-Based Discovery of Related Transcriptional Regulatory Modules and Motifs Using an Experimentally Validated Combinatorial ModelGenome Research, 2002
- Discovery of Regulatory Elements by a Computational Method for Phylogenetic FootprintingGenome Research, 2002
- rVistafor Comparative Sequence-Based Discovery of Functional Transcription Factor Binding SitesGenome Research, 2002
- Surveying Saccharomyces Genomes to Identify Functional Elements by Comparative DNA Sequence AnalysisGenome Research, 2001
- Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitationNature Biotechnology, 1998
- Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies 1 1Edited by G. von HeijneJournal of Molecular Biology, 1998
- Evolutionary trees from DNA sequences: A maximum likelihood approachJournal of Molecular Evolution, 1981