Comparative phyloinformatics of virus genes at micro and macro levels in a distributed computing environment
Open Access
- 13 February 2008
- journal article
- Published by Springer Nature in BMC Bioinformatics
- Vol. 9 (S1) , S23
- https://doi.org/10.1186/1471-2105-9-s1-s23
Abstract
Background: Preparedness for a possible global pandemic caused by viruses such as the highly pathogenic influenza A subtype H5N1 has become a global priority. In particular, it is critical to monitor the appearance of any new emerging subtypes. Comparative phyloinformatics can be used to monitor, analyze, and possibly predict the evolution of viruses. However, in order to utilize the full functionality of available analysis packages for large-scale phyloinformatics studies, a team of computer scientists, biostatisticians and virologists is needed – a requirement which cannot be fulfilled in many cases. Furthermore, the time complexities of many algorithms involved leads to prohibitive runtimes on sequential computer platforms. This has so far hindered the use of comparative phyloinformatics as a commonly applied tool in this area.Results: In this paper the graphical-oriented workflow design system calledQuascadeand its efficient usage for comparative phyloinformatics are presented. In particular, we focus on how this task can be effectively performed in a distributed computing environment. As a proof of concept, the designed workflows are used for the phylogenetic analysis of neuraminidase of H5N1 isolates (micro level) and influenza viruses (macro level). The results of this paper are hence twofold. Firstly, this paper demonstrates the usefulness of a graphical user interface system to design and execute complex distributed workflows for large-scale phyloinformatics studies of virus genes. Secondly, the analysis of neuraminidase on different levels of complexity provides valuable insights of this virus's tendency for geographical based clustering in the phylogenetic tree and also shows the importance of glycan sites in its molecular evolution.Conclusion: The current study demonstrates the efficiency and utility of workflow systems providing a biologist friendly approach to complex biological dataset analysis using high performance computing. In particular, the utility of the platform Quascade for deploying distributed and parallelized versions of a variety of computationally intensive phylogenetic algorithms has been shown. Secondly, the analysis of the utilized H5N1 neuraminidase datasets at macro and micro levels has clearly indicated a pattern of spatial clustering of the H5N1 viral isolates based on geographical distribution rather than temporal or host range based clustering.Keywords
This publication has 17 references indexed in Scilit:
- BioWMS: a web-based Workflow Management System for bioinformaticsBMC Bioinformatics, 2007
- Biowep: a workflow enactment portal for bioinformatics applicationsBMC Bioinformatics, 2007
- Evolutionary Interactions between N-Linked Glycosylation Sites in the HIV-1 EnvelopePLoS Computational Biology, 2007
- KDE Bioscience: Platform for bioinformatics analysis workflowsJournal of Biomedical Informatics, 2006
- Development of distributed bioinformatics applications with GMPConcurrency and Computation: Practice and Experience, 2004
- Taverna: a tool for the composition and enactment of bioinformatics workflowsBioinformatics, 2004
- Avian Influenza A (H5N1) in 10 Patients in VietnamNew England Journal of Medicine, 2004
- Biopipe: A Flexible Framework for Protocol-Based Bioinformatics AnalysisGenome Research, 2003
- Phylogeny estimation: traditional and Bayesian approachesNature Reviews Genetics, 2003
- CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choiceNucleic Acids Research, 1994