Analysis of the occurrence of promoter-sites in DNA

Abstract
We show that the occurrence and homology score (1) of promoter-sites in DNA depends upon the base composition of the DNA. We used simple probability theory to calculate the mean homology score expected for all promoter-sites that had a specific match in the canonical hexamers. By using the square root of this mean score as a measure of significance, we objectively classify all promoter-sites which are reported. We tested the theoretical approach in two ways. First, we used the program (PROMSEARCH)1 to analyze ˜150,000 base-pairs of random sequence DNA with different base compositions and we found excellent agreement with the theoretical predictions. Our second test was the analysis of a number of sequences drawn from the GENBANK DNA sequence database. We have analyzed 20 bacterial and bacteriophage sequences, which consisted of at least one operon, for promoter-sites. We found no absolute preference for promoter-sites within noncoding regions. We show the results of analyzing the phages λ, T7 and fd, and the E. coli lac operon. The major known promoters in these sequences were all found correctly. We discuss the question of the location of a number of minor promoter-sites and show how PROMSEARCH can be used to help identify the correct location of the promoter. This approach can be applied to the search for any DNA site and should allow greater objectivity when comparing DNA sequences for meaningful subsequences.