Evaluation of genomic island predictors using a comparative genomics approach
Open Access
- 5 August 2008
- journal article
- research article
- Published by Springer Nature in BMC Bioinformatics
- Vol. 9 (1) , 329
- https://doi.org/10.1186/1471-2105-9-329
Abstract
Background: Genomic islands (GIs) are clusters of genes in prokaryotic genomes of probable horizontal origin. GIs are disproportionately associated with microbial adaptations of medical or environmental interest. Recently, multiple programs for automated detection of GIs have been developed that utilize sequence composition characteristics, such as G+C ratio and dinucleotide bias. To robustly evaluate the accuracy of such methods, we propose that a dataset of GIs be constructed using criteria that are independent of sequence composition-based analysis approaches. Results: We developed a comparative genomics approach (IslandPick) that identifies both very probable islands and non-island regions. The approach involves 1) flexible, automated selection of comparative genomes for each query genome, using a distance function that picks appropriate genomes for identification of GIs, 2) identification of regions unique to the query genome, compared with the chosen genomes (positive dataset) and 3) identification of regions conserved across all genomes (negative dataset). Using our constructed datasets, we investigated the accuracy of several sequence composition-based GI prediction tools. Conclusion: Our results indicate that AlienHunter has the highest recall, but the lowest measured precision, while SIGI-HMM is the most precise method. SIGI-HMM and IslandPath/DIMOB have comparable overall highest accuracy. Our comparative genomics approach, IslandPick, was the most accurate, compared with a curated list of GIs, indicating that we have constructed suitable datasets. This represents the first evaluation, using diverse and, independent datasets that were not artificially constructed, of the accuracy of several sequence composition-based GI predictors. The caveats associated with this analysis and proposals for optimal island prediction are discussed.Keywords
This publication has 38 references indexed in Scilit:
- Resolving the structural features of genomic islands: A machine learning approachGenome Research, 2007
- Identification of compositionally distinct regions in genomes using the centroid methodBioinformatics, 2007
- Detecting laterally transferred genes: use of entropic clustering methods and genome positionNucleic Acids Research, 2007
- Genetic flux over time in the Salmonella lineageGenome Biology, 2007
- MobilomeFINDER: web-based tools for in silico and experimental discovery of bacterial genomic islandsNucleic Acids Research, 2007
- Evidence of a Large Novel Gene Pool Associated with Prokaryotic Genomic IslandsPLoS Genetics, 2005
- Viruses in the seaNature, 2005
- Pathogenicity Islands and the Evolution of MicrobesAnnual Review of Microbiology, 2000
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Phylogenetic analysis of carbamoylphosphate synthetase genes: complex evolutionary history includes an internal duplication within a gene which can root the tree of lifeMolecular Biology and Evolution, 1996