A New Perspective on V3 Phenotype Prediction

Abstract
The particular coreceptor used by a strain of HIV-1 to enter a host cell is highly indicative of its pathology. HIV-1 coreceptor usage is primarily determined by the amino acid sequences of the V3 loop region of the viral envelope glycoprotein. The canonical approach to sequence-based prediction of coreceptor usage was derived via statistical analysis of a less reliable and significantly smaller data set than is presently available. We aimed to produce a superior phenotypic classifier by applying modern machine learning (ML) techniques to the current database of V3 loop sequences with known phenotype. The trained classifiers along with the sequence data are available for public use at the supplementary website: http://genomiac2.ucsd.edu:8080/wetcat/v3.html