Asap: A Framework for Over-Representation Statistics for Transcription Factor Binding Sites
Open Access
- 20 February 2008
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLOS ONE
- Vol. 3 (2) , e1623
- https://doi.org/10.1371/journal.pone.0001623
Abstract
In studies of gene regulation the efficient computational detection of over-represented transcription factor binding sites is an increasingly important aspect. Several published methods can be used for testing whether a set of hypothesised co-regulated genes share a common regulatory regime based on the occurrence of the modelled transcription factor binding sites. However there is little or no information available for guiding the end users choice of method. Furthermore it would be necessary to obtain several different software programs from various sources to make a well-founded choice. We introduce a software package, Asap, for fast searching with position weight matrices that include several standard methods for assessing over-representation. We have compared the ability of these methods to detect over-represented transcription factor binding sites in artificial promoter sequences. Controlling all aspects of our input data we are able to identify the optimal statistics across multiple threshold values and for sequence sets containing different distributions of transcription factor binding sites. We show that our implementation is significantly faster than more naïve scanning algorithms when searching with many weight matrices in large sequence sets. When comparing the various statistics, we show that those based on binomial over-representation and Fisher's exact test performs almost equally good and better than the others. An online server is available at http://servers.binf.ku.dk/asap/.Keywords
This publication has 25 references indexed in Scilit:
- JASPAR, the open access database of transcription factor-binding profiles: new content and tools in the 2008 updateNucleic Acids Research, 2007
- oPOSSUM: integrated tools for analysis of regulatory motif over-representationNucleic Acids Research, 2007
- Epigenetic Reprogramming ofOCT4andNANOGRegulatory Regions by Embryonal Carcinoma Cell ExtractMolecular Biology of the Cell, 2007
- Genome-wide analysis of mammalian promoter architecture and evolutionNature Genetics, 2006
- Assessing computational tools for the discovery of transcription factor binding sitesNature Biotechnology, 2005
- Applied bioinformatics for the identification of regulatory elementsNature Reviews Genetics, 2004
- Unbiased Mapping of Transcription Factor Binding Sites along Human Chromosomes 21 and 22 Points to Widespread Regulation of Noncoding RNAsCell, 2004
- Rank order metrics for quantifying the association of sequence features with gene regulationBioinformatics, 2003
- rVistafor Comparative Sequence-Based Discovery of Functional Transcription Factor Binding SitesGenome Research, 2002
- A Genomic Regulatory Network for DevelopmentScience, 2002