Partitioning the Genetic Diversity of a Virus Family: Approach and Evaluation through a Case Study of Picornaviruses
- 1 April 2012
- journal article
- research article
- Published by American Society for Microbiology in Journal of Virology
- Vol. 86 (7) , 3890-3904
- https://doi.org/10.1128/jvi.07173-11
Abstract
The recent advent of genome sequences as the only source available to classify many newly discovered viruses challenges the development of virus taxonomy by expert virologists who traditionally rely on extensive virus characterization. In this proof-of-principle study, we address this issue by presenting a computational approach (DEmARC) to classify viruses of a family into groups at hierarchical levels using a sole criterion-intervirus genetic divergence. To quantify genetic divergence, we used pairwise evolutionary distances (PEDs) estimated by maximum likelihood inference on a multiple alignment of family-wide conserved proteins. PEDs were calculated for all virus pairs, and the resulting distribution was modeled via a mixture of probability density functions. The model enables the quantitative inference of regions of distance discontinuity in the family-wide PED distribution, which define the levels of hierarchy. For each level, a limit on genetic divergence, below which two viruses join the same group, was objectively selected among a set of candidates by minimizing violations of intragroup PEDs to the limit. In a case study, we applied the procedure to hundreds of genome sequences of picornaviruses and extensively evaluated it by modulating four key parameters. It was found that the genetics-based classification largely tolerates variations in virus sampling and multiple alignment construction but is affected by the choice of protein and the measure of genetic divergence. In an accompanying paper (C. Lauber and A. E. Gorbalenya, J. Virol. 86:3905-3915, 2012), we analyze the substantial insight gained with the genetics-based classification approach by comparing it with the expert-based picornavirus taxonomy.This publication has 71 references indexed in Scilit:
- Viral Mutation RatesJournal of Virology, 2010
- RDP3: a flexible and fast computer program for analyzing recombinationBioinformatics, 2010
- Classification of papillomaviruses (PVs) based on 189 PV types and proposal of taxonomic amendmentsPublished by Elsevier ,2010
- A Novel Picornavirus Associated with GastroenteritisJournal of Virology, 2009
- Database resources of the National Center for Biotechnology InformationNucleic Acids Research, 2009
- Impact of Exogenous Sequences on the Characteristics of an Epidemic Type 2 Recombinant Vaccine-Derived PoliovirusJournal of Virology, 2008
- Improvement of Phylogenies after Removing Divergent and Ambiguously Aligned Blocks from Protein Sequence AlignmentsSystematic Biology, 2007
- Evidence for emergence of diverse polioviruses from C-cluster coxsackie A viruses and implications for global poliovirus eradicationProceedings of the National Academy of Sciences, 2007
- MUSCLE: multiple sequence alignment with high accuracy and high throughputNucleic Acids Research, 2004
- CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choiceNucleic Acids Research, 1994