Neighbor-Dependent Ramachandran Probability Distributions of Amino Acids Developed from a Hierarchical Dirichlet Process Model
Open Access
- 29 April 2010
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLoS Computational Biology
- Vol. 6 (4) , e1000763
- https://doi.org/10.1371/journal.pcbi.1000763
Abstract
Distributions of the backbone dihedral angles of proteins have been studied for over 40 years. While many statistical analyses have been presented, only a handful of probability densities are publicly available for use in structure validation and structure prediction methods. The available distributions differ in a number of important ways, which determine their usefulness for various purposes. These include: 1) input data size and criteria for structure inclusion (resolution, R-factor, etc.); 2) filtering of suspect conformations and outliers using B-factors or other features; 3) secondary structure of input data (e.g., whether helix and sheet are included; whether beta turns are included); 4) the method used for determining probability densities ranging from simple histograms to modern nonparametric density estimation; and 5) whether they include nearest neighbor effects on the distribution of conformations in different regions of the Ramachandran map. In this work, Ramachandran probability distributions are presented for residues in protein loops from a high-resolution data set with filtering based on calculated electron densities. Distributions for all 20 amino acids (with cis and trans proline treated separately) have been determined, as well as 420 left-neighbor and 420 right-neighbor dependent distributions. The neighbor-independent and neighbor-dependent probability densities have been accurately estimated using Bayesian nonparametric statistical analysis based on the Dirichlet process. In particular, we used hierarchical Dirichlet process priors, which allow sharing of information between densities for a particular residue type and different neighbor residue types. The resulting distributions are tested in a loop modeling benchmark with the program Rosetta, and are shown to improve protein loop conformation prediction significantly. The distributions are available at http://dunbrack.fccc.edu/hdp. The three-dimensional structure of a protein enables it to perform its specific function, which may be catalysis, DNA binding, cell signaling, maintaining cell shape and structure, or one of many other functions. Predicting the structures of proteins is an important goal of computational biology. One way of doing this is to figure out the rules that determine protein structure from protein sequences by determining how local protein sequence is associated with local protein structure. That is, many (but not all) of the interactions that determine protein structure occur between amino acids that are a short distance away from each other in the sequence. This is particularly true in the irregular parts of protein structure, often called loops. In this work, we have performed a statistical analysis of the structure of the protein backbone in loops as a function of the protein sequence. We have determined how an amino acid bends the local backbone due to its amino acid type and the amino acid types of its neighbors. We used a recently developed statistical method that is particularly suited to this problem. The analysis shows that backbone conformation prediction can be improved using the information in the statistical distributions we have developed.Keywords
This publication has 52 references indexed in Scilit:
- Conformation Dependence of Backbone Geometry in ProteinsStructure, 2009
- Density Estimation for Protein Conformation Angles Using a Bivariate von Mises Distribution and Bayesian NonparametricsJournal of the American Statistical Association, 2009
- Assessing Side-Chain Perturbations of the Protein Backbone: A Knowledge-Based Classification of Residue Ramachandran SpaceJournal of Molecular Biology, 2008
- Loop modeling: Sampling, filtering, and scoringProteins-Structure Function and Bioinformatics, 2008
- Protein–Protein Docking with Backbone FlexibilityJournal of Molecular Biology, 2007
- Variational inference for Dirichlet process mixturesBayesian Analysis, 2006
- Bayesian Statistical Studies of the Ramachandran DistributionStatistical Applications in Genetics and Molecular Biology, 2005
- Influence of proline residues on protein conformationPublished by Elsevier ,2004
- Local Propensities and Statistical Potentials of Backbone Dihedral Angles in ProteinsJournal of Molecular Biology, 2004
- Influence of the local amino acid sequence upon the zones of the torsional angles .phi. and .psi. adopted by residues in proteinsBiochemistry, 1991