Improving promoter prediction Improving promoter prediction for the NNPP2.2 algorithm: a case study using Escherichia coli DNA sequences
Open Access
- 28 September 2004
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 21 (5) , 601-607
- https://doi.org/10.1093/bioinformatics/bti047
Abstract
Motivation: Although a great deal of research has been undertaken in the area of promoter prediction, prediction techniques are still not fully developed. Many algorithms tend to exhibit poor specificity, generating many false positives, or poor sensitivity. The neural network prediction program NNPP2.2 is one such example. Results: To improve the NNPP2.2 prediction technique, the distance between the transcription start site (TSS) associated with the promoter and the translation start site (TLS) of the subsequent gene coding region has been studied for Escherichia coli K12 bacteria. An empirical probability distribution that is consistent for all E.coli promoters has been established. This information is combined with the results from NNPP2.2 to create a new technique called TLS–NNPP, which improves the specificity of promoter prediction. The technique is shown to be effective using E.coli DNA sequences, however, it is applicable to any organism for which a set of promoters has been experimentally defined. Availability: The data used in this project and the prediction results for the tested sequences can be obtained from http://www.uow.edu.au/~yanxia/E_Coli_paper/SBurden_Results.xls Contact:alh98@uow.edu.auKeywords
This publication has 22 references indexed in Scilit:
- Artificial neural networks for prediction of mycobacterial promoter sequencesComputational Biology and Chemistry, 2003
- Dragon Gene Start Finder identifies approximate locations of the 5' ends of genesNucleic Acids Research, 2003
- Computational Detection and Location of Transcription Start Sites in Mammalian Genomic DNAGenome Research, 2002
- The EcoCyc DatabaseNucleic Acids Research, 2002
- Periodical distribution of transcription factor sites in promoter regions and connection with chromatin structureProceedings of the National Academy of Sciences, 1999
- PromFD 1.0: a computer program that predicts eukaryotic pol II promoters using strings and IMD matricesBioinformatics, 1997
- Detection of eukaryotic promoters using Markov transition matricesComputers & Chemistry, 1997
- Computation of DNA structural variability—a new predictor of DNA regulatory regionsBioinformatics, 1996
- Neural network optimization forE.colipromoter predictionNucleic Acids Research, 1991
- Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequencesJournal of Molecular Biology, 1990