An Evolutionary-Network Model Reveals Stratified Interactions in the V3 Loop of the HIV-1 Envelope

Abstract
The third variable loop (V3) of the human immunodeficiency virus type 1 (HIV-1) envelope is a principal determinant of antibody neutralization and progression to AIDS. Although it is undoubtedly an important target for vaccine research, extensive genetic variation in V3 remains an obstacle to the development of an effective vaccine. Comparative methods that exploit the abundance of sequence data can detect interactions between residues of rapidly evolving proteins such as the HIV-1 envelope, revealing biological constraints on their variability. However, previous studies have relied implicitly on two biologically unrealistic assumptions: (1) that founder effects in the evolutionary history of the sequences can be ignored, and; (2) that statistical associations between residues occur exclusively in pairs. We show that comparative methods that neglect the evolutionary history of extant sequences are susceptible to a high rate of false positives (20%–40%). Therefore, we propose a new method to detect interactions that relaxes both of these assumptions. First, we reconstruct the evolutionary history of extant sequences by maximum likelihood, shifting focus from extant sequence variation to the underlying substitution events. Second, we analyze the joint distribution of substitution events among positions in the sequence as a Bayesian graphical model, in which each branch in the phylogeny is a unit of observation. We perform extensive validation of our models using both simulations and a control case of known interactions in HIV-1 protease, and apply this method to detect interactions within V3 from a sample of 1,154 HIV-1 envelope sequences. Our method greatly reduces the number of false positives due to founder effects, while capturing several higher-order interactions among V3 residues. By mapping these interactions to a structural model of the V3 loop, we find that the loop is stratified into distinct evolutionary clusters. We extend our model to detect interactions between the V3 and C4 domains of the HIV-1 envelope, and account for the uncertainty in mapping substitutions to the tree with a parametric bootstrap. The third variable loop (V3) of the human immunodeficiency virus type 1 (HIV-1) envelope is a principal determinant of viral growth characteristics and an important target for the immune system. Interactions between residues of V3 allow the virus to shift between combinations of residues to escape the immune system while retaining its structure and functions. Comparative study of HIV-1 V3 sequences can detect such interactions by the covariation of sites in the sequence, which can then be used to inform vaccine development, but current methods for detecting such associations rely on biologically unrealistic assumptions. We demonstrate that these assumptions cause an excessive number of spurious associations, and present a new approach that couples phylogenetic and Bayesian network models, and greatly reduces this number while retaining the ability to detect real associations. Our analysis reveals that the V3 loop is stratified into discrete layers of interacting residues, suggesting a partition of functions along this viral structure with implications for vaccine development.