An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences
- 1 January 1990
- journal article
- research article
- Published by Wiley in Proteins-Structure Function and Bioinformatics
- Vol. 7 (1) , 41-51
- https://doi.org/10.1002/prot.340070105
Abstract
Statistical methodology for the identification and characterization of protein binding sites in a set of unaligned DNA fragments is presented. Each sequence must contain at least one common site. No alignment of the sites is required. Instead, the uncertainty in the location of the sites is handled by employing the missing information principle to develop an “expectation maximization” (EM) algorithm. This approach allows for the simultaneous identification of the sites and characterization of the binding motifs. The reliability of the algorithm increases with the number of fragments, but the computations increase only linearly. The method is illustrated with an example, using known cyclic adenosine monophophate receptor protein (CRP) binding sites. The final motif is utilized in a search for undiscovered CRP binding sites.Keywords
This publication has 23 references indexed in Scilit:
- Consensus methods for finding and ranking DNA binding sitesJournal of Molecular Biology, 1989
- A complex nucleoprotein structure involved in activation of transcription of two divergent Escherichia coli promotersJournal of Molecular Biology, 1989
- Computer Methods for Analyzing Sequence Recognition of Nucleic AcidsAnnual Review of Biophysics, 1988
- Genome Projects Ready to GoScience, 1988
- Selection of DNA binding sites by regulatory proteinsJournal of Molecular Biology, 1987
- Electrostatic calculations and model‐building suggest that DNA bound to CAP is sharply bentProteins-Structure Function and Bioinformatics, 1986
- Promoter-specific activation of RNA polymerase II transcription by Sp1Trends in Biochemical Sciences, 1986
- Cyclic AMP Receptor Protein: Role in Transcription ActivationScience, 1984
- Mixture Densities, Maximum Likelihood and the EM AlgorithmSIAM Review, 1984
- Exploratory latent structure analysis using both identifiable and unidentifiable modelsBiometrika, 1974