Detection of Short Protein Coding Regions within the Cyanobacterium Genome: Application of the Hidden Markov Model
Open Access
- 1 January 1996
- journal article
- research article
- Published by Oxford University Press (OUP) in DNA Research
- Vol. 3 (6) , 355-361
- https://doi.org/10.1093/dnares/3.6.355
Abstract
The gene-finding programs developed so far have not paid much attention to the detection of short protein coding regions (CDSs). However, the detection of short CDSs is important for the study of photosynthesis. We utilized GeneHacker, a gene-finding program based on the hidden Markov model (HMM), to detect short CDSs (from 90 to 300 bases) in a 1.0 mega contiguous sequence of cyanobacterium Synechocystis sp. strain PCC6803 which carries a complete set of genes for oxygenic photosynthesis. GeneHacker differs from other gene-finding programs based on the HMM in that it utilizes di-codon statistics as well. GeneHacker successfully detected seven out of the eight short CDSs annotated in this sequence and was clearly superior to GeneMark in this range of length. GeneHacker detected 94 potentially new CDSs, 9 of which have counterparts in the genetic databases. Four of the nine CDSs were less than 150 bases and were photosynthesis-related genes. The results show the effectiveness of GeneHacker in detecting very short CDSs corresponding to genes.Keywords
This publication has 0 references indexed in Scilit: