Determination of Local Statistical Significance of Patterns in Markov Sequences with Application to Promoter Element Identification
- 1 January 2004
- journal article
- Published by Mary Ann Liebert Inc in Journal of Computational Biology
- Vol. 11 (1) , 1-14
- https://doi.org/10.1089/106652704773416858
Abstract
High-level eukaryotic genomes present a particular challenge to the computational identification of transcription factor binding sites (TFBSs) because of their long noncoding regions and large numbers of repeat elements. This is evidenced by the noisy results generated by most current methods. In this paper, we present a p-value-based scoring scheme using probability generating functions to evaluate the statistical significance of potential TFBSs. Furthermore, we introduce the local genomic context into the model so that candidate sites are evaluated based both on their similarities to known binding sites and on their contrasts against their respective local genomic contexts. We demonstrate that our approach is advantageous in the prediction of myogenin and MEF2 binding sites in the human genome. We also apply LMM to large-scale human binding site sequences in situ and found that, compared to current popular methods, LMM analysis can reduce false positive errors by more than 50% without compromising sensitivity. This improvement will be of importance to any subsequent algorithm that aims to detect regulatory modules based on known PSSMs.Keywords
This publication has 14 references indexed in Scilit:
- The Ensembl genome database projectNucleic Acids Research, 2002
- Computational identification of Cis -regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae 1 1Edited by F. E. CohenJournal of Molecular Biology, 2000
- Matlnd and Matlnspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence dataNucleic Acids Research, 1995
- Detecting Subtle Sequence Signals: a Gibbs Sampling Strategy for Multiple AlignmentScience, 1993
- A single MEF-2 site is a major positive regulatory element required for transcription of the muscle-specific subunit of the human phosphoglycerate mutase gene in skeletal and cardiac muscle cells.Molecular and Cellular Biology, 1992
- A highly conserved enhancer downstream of the human MLC1/3 locus is a target for multiple myogenic determination factorsNucleic Acids Research, 1990
- An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequencesProteins-Structure Function and Bioinformatics, 1990
- A gel electrophoresis method for quantifying the binding of proteins to specific DNA regions: application to components of the Escherichia coli lactose operon regulatory systemNucleic Acids Research, 1981
- Equilibria and kinetics of lac repressor-operator interactions by polyacrylamide gel electrophoresisNucleic Acids Research, 1981
- DNAase footprinting a simple method for the detection of protein-DNA binding specificityNucleic Acids Research, 1978