Interpolated variable order motifs for identification of horizontally acquired DNA: revisiting the Salmonella pathogenicity islands
Top Cited Papers
Open Access
- 12 July 2006
- journal article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 22 (18) , 2196-2203
- https://doi.org/10.1093/bioinformatics/btl369
Abstract
There is a growing literature on the detection of Horizontal Gene Transfer (HGT) events by means of parametric, non-comparative methods. Such approaches rely only on sequence information and utilize different low and high order indices to capture compositional deviation from the genome backbone; the superiority of the latter over the former has been shown elsewhere. However even high order k-mers may be poor estimators of HGT, when insufficient information is available, e.g. in short sliding windows. Most of the current HGT prediction methods require pre-existing annotation, which may restrict their application on newly sequenced genomes. We introduce a novel computational method, Interpolated Variable Order Motifs (IVOMs), which exploits compositional biases using variable order motif distributions and captures more reliably the local composition of a sequence compared with fixed-order methods. For optimal localization of the boundaries of each predicted region, a second order, two-state hidden Markov model (HMM) is implemented in a change-point detection framework. We applied the IVOM approach to the genome of Salmonella enterica serovar Typhi CT18, a well-studied prokaryote in terms of HGT events, and we show that the IVOMs outperform state-of-the-art low and high order motif methods predicting not only the already characterized Salmonella Pathogenicity Islands (SPI-1 to SPI-10) but also three novel SPIs (SPI-15, SPI-16, SPI-17) and other HGT events. The software is available under a GPL license as a standalone application at http://www.sanger.ac.uk/Software/analysis/alien_hunter gsv@sanger.ac.uk Supplementary data are available at Bioinformatics online.Keywords
This publication has 28 references indexed in Scilit:
- ACT: the Artemis comparison toolBioinformatics, 2005
- The genome sequence of Salmonella enterica serovar Choleraesuis, a highly invasive and resistant zoonotic pathogenNucleic Acids Research, 2005
- IslandPath: aiding detection of genomic islands in prokaryotesBioinformatics, 2003
- HGT-DB: a database of putative horizontally transferred genes in prokaryotic complete genomesNucleic Acids Research, 2003
- Pathogenicity Islands and the Evolution of MicrobesAnnual Review of Microbiology, 2000
- Functional analysis of the O antigen glucosylation gene cluster of Shigella flexneri bacteriophage SfXMicrobiology, 1999
- Molecular and functional analysis indicates a mosaic structure of Salmonella pathogenicity island 2Molecular Microbiology, 1999
- Biological Sequence AnalysisPublished by Cambridge University Press (CUP) ,1998
- Determining Divergence Times of the Major Kingdoms of Living Organisms with a Protein ClockScience, 1996
- Deletions of chromosomal regions coding for fimbriae and hemolysins occur in vitro and in vivo in various extra intestinal Escherichia coli isolatesMicrobial Pathogenesis, 1990