Similarity of position frequency matrices for transcription factor binding sites
- 19 August 2004
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 21 (3) , 307-313
- https://doi.org/10.1093/bioinformatics/bth480
Abstract
Motivation: Transcription-factor binding sites (TFBS) in promoter sequences of higher eukaryotes are commonly modeled using position frequency matrices (PFM). The ability to compare PFMs representing binding sites is especially important for de novo sequence motif discovery, where it is desirable to compare putative matrices to one another and to known matrices.Results: We describe a PFM similarity quantification method based on product multinomial distributions, demonstrate its ability to identify PFM similarity and show that it has a better false positive to false negative ratio compared to existing methods.We grouped TFBS frequency matrices from two libraries into matrix families and identified the matrices that are common and unique to these libraries. We identified similarities and differences between the skeletal-muscle-specific and non-muscle-specific frequency matrices for the binding sites of Mef-2, Myf, Sp-1, SRF and TEF of Wasserman and Fickett. We further identified known frequency matrices and matrix families that were strongly similar to the matrices given by Wasserman and Fickett. We provide methodology and tools to compare and query libraries of frequency matrices for TFBSs.Availability: Software is available to use over the Web at http://rulai.cshl.edu/MatCompareContact: dschones@cshl.eduSupplementary information: Database and clustering statistics, matrix families and representatives are available at http://rulai.cshl.edu/MatCompare/SupplementaryKeywords
This publication has 26 references indexed in Scilit:
- Constrained Binding Site Diversity within Families of Transcription Factors Enhances Pattern Discovery BioinformaticsJournal of Molecular Biology, 2004
- Computational identification of Cis -regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae 1 1Edited by F. E. CohenJournal of Molecular Biology, 2000
- Identification of regulatory regions which confer muscle-specific gene expressionJournal of Molecular Biology, 1998
- Bayesian Models for Multiple Local Sequence Alignment and Gibbs Sampling StrategiesJournal of the American Statistical Association, 1995
- Sequence logos: a new way to display consensus sequencesNucleic Acids Research, 1990
- Identification of consensus patterns in unaligned DNA sequences known to be functionally relatedBioinformatics, 1990
- Selection of DNA binding sites by regulatory proteinsJournal of Molecular Biology, 1988
- Selection of DNA binding sites by regulatory proteinsJournal of Molecular Biology, 1987
- Information content of binding sites on nucleotide sequencesJournal of Molecular Biology, 1986
- A general method applicable to the search for similarities in the amino acid sequence of two proteinsJournal of Molecular Biology, 1970