Computer-assisted prediction, classification, and delimitation of protein binding sites in nucleic acids
- 1 January 1993
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 21 (7) , 1655-1664
- https://doi.org/10.1093/nar/21.7.1655
Abstract
We present a method to determine the location and extent of protein binding regions in nucleic acids by computer-assisted analysis of sequence data. The program ConsIndex establishes a library of consensus descriptions based on sequence sets containing known regulatory elements. These defined consensus descriptions are used by the program ConsInspector to predict binding sites in new sequences. We show the programs to correctly determine the significant regions involved in transcriptional control of seven sequence elements. The internal profile of relative variability of individual nucleotide positions within these regions paralleled experimental profiles of biological significance. Consensus descriptions are determined by employing an anchored alignment scheme, the results of which are then evaluated by a novel method which is superior to cluster algorithms. The alignment procedure is able to include several closely related sequences without biasing the consensus description. Moreover, the algorithm detects additional elements on the basis of a moderate distance correlation and is capable of discriminating between real binding sites and false positive matches. The software is well suited to cope with the frequent phenomenon of optional elements present in a subset of functionally similar sequences, while taking maximal advantage of the existing sequence data base. Since it requires only a minimum of seven sequences for a single element, it is applicable to a wide range of binding sites.Keywords
This publication has 32 references indexed in Scilit:
- Expectation maximization algorithm for identifying protein-binding sites with variable lengths from unaligned DNA fragmentsJournal of Molecular Biology, 1992
- Diversity and specificity in transcriptional regulation: the benefits of heterotypic dimerizationTrends in Biochemical Sciences, 1991
- Identification of consensus patterns in unaligned DNA sequences known to be functionally relatedBioinformatics, 1990
- The jun proto-oncogene is positively autoregulated by its product, Jun/AP-1Published by Elsevier ,1988
- A multiplicity of CCAAT box-binding proteinsCell, 1987
- Recognition of characteristic patterns in sets of functionally equivalent DNA sequencesBioinformatics, 1987
- Purified transcription factor AP-1 interacts with TPA-inducible enhancer elementsCell, 1987
- Cooperativity of glucocorticoid response elements located far upstream of the tyrosine aminotransferase geneCell, 1987
- Information content of binding sites on nucleotide sequencesJournal of Molecular Biology, 1986
- Rigorous pattern-recognition methods for DNA sequencesJournal of Molecular Biology, 1985