GISMO--gene identification using a support vector machine for ORF classification
Open Access
- 14 December 2006
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 35 (2) , 540-549
- https://doi.org/10.1093/nar/gkl1083
Abstract
We present the novel prokaryotic gene finder GISMO, which combines searches for protein family domains with composition-based classification based on a support vector machine. GISMO is highly accurate; exhibiting high sensitivity and specificity in gene identification. We found that it performs well for complete prokaryotic chromosomes, irrespective of their GC content, and also for plasmids as short as 10 kb, short genes and for genes with atypical sequence composition. Using GISMO, we found several thousand new predictions for the published genomes that are supported by extrinsic evidence, which strongly suggest that these are very likely biologically active genes. The source code for GISMO is freely available under the GPL license.Keywords
This publication has 42 references indexed in Scilit:
- On the total number of genes and their length distribution in complete microbial genomesTrends in Genetics, 2001
- GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regionsNucleic Acids Research, 2001
- Complete Genome Sequence of Enterohemorrhagic Eschelichia coli O157:H7 and Genomic Comparison with a Laboratory Strain K-12DNA Research, 2001
- Genome sequence of the endocellular bacterial symbiont of aphids Buchnera sp. APSNature, 2000
- Lateral gene transfer and the nature of bacterial innovationNature, 2000
- Codon usage in Chlamydia trachomatis is the result of strand-specific mutational biases and a complex pattern of selective forcesNucleic Acids Research, 2000
- Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39Nucleic Acids Research, 2000
- A Discriminative Framework for Detecting Remote Protein HomologiesJournal of Computational Biology, 2000
- Improved microbial gene identification with GLIMMERNucleic Acids Research, 1999
- Codon usage and lateral gene transfer in Bacillus subtilisCurrent Opinion in Microbiology, 1999