Evaluating phylogenetic footprinting for human–rodent comparisons
Open Access
- 6 December 2005
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 22 (4) , 430-437
- https://doi.org/10.1093/bioinformatics/bti819
Abstract
Motivation: ‘Phylogenetic footprinting’ is a widely applied approach to identify regulatory regions and potential transcription factor binding sites (TFBSs) using alignments of non-coding orthologous regions from two or more organisms. A systematic evaluation of its validity and usability based on known TFBSs is needed to use phylogenetic footprinting most effectively in the identification of unknown TFBSs. Results: In this paper we use 2678 human, mouse and rat TFBSs from the TRANSFAC® database for this evaluation. To ensure the retrieval of correct orthologous sequences, we combine gene annotation and sequence homology searches. Demanding a sequence identity of at least 65% is most effective in discriminating TFBSs from non-functional sequence parts, while different alignment algorithms only have a minor influence on TFBS identification by human–rodent comparisons. With this threshold ∼72% of the known TFBSs are found conserved, a number which varies significantly between different transcription factors and also depends on the function of the regulated gene. TFBSs for certain transcription factors do not require strict sequence conservation but instead may show a high pattern conservation, limiting somewhat the validity of purely sequence-based phylogenetic footprinting. Availability: Scripts are available from the authors upon request. Contact:tsa@bioinf.med.uni-goettingen.de Supplementary information:Keywords
This publication has 48 references indexed in Scilit:
- Functional Evolution of a cis-Regulatory ModulePLoS Biology, 2005
- Highly Conserved Non-Coding Sequences Are Associated with Vertebrate DevelopmentPLoS Biology, 2004
- Embryonic ε and γ globin genes of a prosimian primate (Galago crassicaudatus): Nucleotide and amino acid sequences, developmental regulation and phylogenetic footprintsPublished by Elsevier ,2004
- Eukaryotic Regulatory Element Conservation Analysis and Identification Using Comparative GenomicsGenome Research, 2004
- Sequencing and comparison of yeast species to identify genes and regulatory elementsNature, 2003
- Distinguishing Regulatory DNA From Neutral SitesGenome Research, 2003
- Human-mouse genome comparisons to locate regulatory sitesNature Genetics, 2000
- T-coffee: a novel method for fast and accurate multiple sequence alignment 1 1Edited by J. ThorntonJournal of Molecular Biology, 2000
- The Aldolase A Promoter in Proliferating Rat Thymocytes Is Regulated by a Cluster of SP1 Sites and a Distal ModulatorBiochemical and Biophysical Research Communications, 1996
- CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choiceNucleic Acids Research, 1994