Consensus Promoter Identification in the Human Genome Utilizing Expressed Gene Markers and Gene Modeling
Open Access
- 1 March 2002
- journal article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 12 (3) , 462-469
- https://doi.org/10.1101/gr.198002
Abstract
Deciphering the human genome includes locating the promoters that initiate transcription and identifying the exons of genes. Many promoter prediction programs have been proposed, but when they are applied to extended regions of the genome, most of their predictions are false-positives. The extensive collection of gene transcript sequences is an important new source of information, which has not been used previously in promoter predictions. Our approach is to enhance the specificity of predictions by restricting the genomic regions that are searched using gene transcript alignments as anchors in the genome for gene modeling. We developed a consensus promoter prediction method combining previously developed algorithms with theGENSCAN gene modeling program. Our method,CONPRO (CONsensus PROmoter), identifies promoters with very high confidence, and the predicted promoters are guaranteed to be associated with genes. On our test data set, the method correctly detects promoters for approximately half of all human genes (37%–71%), and most predictions are true promoters (85%–90%). Applying our method to the human genome and human genes from the Unigene data set, we find the promoters for 13,744 genes. Of these, 6440 are genes with a functionally cloned mRNA, and 7304 are novel genes for which only expressed sequence tags (ESTs) are available. Candidate promoters for many novel genes will be a useful resource in elucidating complex biological response mechanisms. CONPRO is available for searching promoters in the human genome (http://stl.bioinformatics.med.umich.edu/conpro).Keywords
This publication has 21 references indexed in Scilit:
- First Pass Annotation of Promoters on Human Chromosome 22Genome Research, 2001
- The Sequence of the Human GenomeScience, 2001
- Initial sequencing and analysis of the human genomeNature, 2001
- Statistical Analysis of the 5′ Untranslated Region of Human mRNA Using “Oligo-Capped” cDNA LibrariesGenomics, 2000
- Highly specific localization of promoter regions in large genomic sequences by PromoterInspector: a novel context analysis approachJournal of Molecular Biology, 2000
- Prediction of complete gene structures in human genomic DNAJournal of Molecular Biology, 1997
- PromFD 1.0: a computer program that predicts eukaryotic pol II promoters using strings and IMD matricesBioinformatics, 1997
- The prediction of vertebrate promoter regions using differential hexamer frequency analysisBioinformatics, 1996
- Predicting Pol II Promoter Sequences using Transcription Factor Binding SitesJournal of Molecular Biology, 1995
- Identification of protein coding regions by database similarity searchNature Genetics, 1993