Classification of hepatitis C virus and human immunodeficiency virus-1 sequences with the branching index
Open Access
- 1 September 2008
- journal article
- research article
- Published by Microbiology Society in Journal of General Virology
- Vol. 89 (9) , 2098-2107
- https://doi.org/10.1099/vir.0.83657-0
Abstract
Classification of viral sequences should be fast, objective, accurate and reproducible. Most methods that classify sequences use either pair-wise distances or phylogenetic relations, but cannot discern when a sequence is unclassifiable. The branching index (BI) combines distance and phylogeny methods to compute a ratio that quantifies how closely a query sequence clusters with a subtype clade. In the hypothesis-testing framework of statistical inference, the BI is compared with a threshold to test whether sufficient evidence exists for the query sequence to be classified among known sequences. If above the threshold, the null hypothesis of no support for the subtype relation is rejected and the sequence is taken as belonging to the subtype clade with which it clusters on the tree. This study evaluates statistical properties of the BI for subtype classification in hepatitis C virus (HCV) and human immunodeficiency virus-1 (HIV-1). Pairs of BI values with known positive- and negative-test results were computed from 10 000 random fragments of reference alignments. Sampled fragments were of sufficient length to contain phylogenetic signals that grouped reference sequences together properly into subtype clades. For HCV, a threshold BI of 0.71 yields 95.1 % agreement with reference subtypes, with equal false-positive and false-negative rates. For HIV-1, a threshold of 0.66 yields 93.5 % agreement. Higher thresholds can be used where lower false-positive rates are required. In synthetic recombinants, regions without breakpoints are recognized accurately; regions with breakpoints do not represent any known subtype uniquely. Web-based services for viral subtype classification with the BI are available online.This publication has 26 references indexed in Scilit:
- Virus species and virus identification: Past and current controversiesInfection, Genetics and Evolution, 2007
- A comprehensive system for consistent numbering of HCV sequences, proteins and epitopesHepatology, 2006
- Consensus Proposals for a Unified System of Nomenclature of Hepatitis C Virus Genotypes *Hepatology, 2005
- An automated genotyping system for analysis of HIV-1 and other microbial sequencesBioinformatics, 2005
- The Los Alamos hepatitis C sequence databaseBioinformatics, 2004
- The Bioperl Toolkit: Perl Modules for the Life SciencesGenome Research, 2002
- HIV-1 Nomenclature ProposalScience, 2000
- Identification of Breakpoints in Intergenotypic Recombinants of HIV Type 1 by BootscanningAIDS Research and Human Retroviruses, 1995
- CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choiceNucleic Acids Research, 1994
- Application and Accuracy of Molecular PhylogeniesScience, 1994