Large‐scale prediction of disulphide bridges using kernel methods, two‐dimensional recursive neural networks, and weighted graph matching

30 November 2005

journal article
research article
Published by Wiley in Proteins-Structure Function and Bioinformatics

Vol. 62 (3) , 617-629
https://doi.org/10.1002/prot.20787

Abstract

The formation of disulphide bridges between cysteines plays an important role in protein folding, structure, function, and evolution. Here, we develop new methods for predicting disulphide bridges in proteins. We first build a large curated data set of proteins containing disulphide bridges to extract relevant statistics. We then use kernel methods to predict whether a given protein chain contains intrachain disulphide bridges or not, and recursive neural networks to predict the bonding probabilities of each pair of cysteines in the chain. These probabilities in turn lead to an accurate estimation of the total number of disulphide bridges and to a weighted graph matching problem that can be addressed efficiently to infer the global disulphide bridge connectivity pattern. This approach can be applied both in situations where the bonded state of each cysteine is known, or in ab initio mode where the state is unknown. Furthermore, it can easily cope with chains containing an arbitrary number of disulphide bridges, overcoming one of the major limitations of previous approaches. It can classify individual cysteine residues as bonded or nonbonded with 87% specificity and 89% sensitivity. The estimate for the total number of bridges in each chain is correct 71% of the times, and within one from the true value over 94% of the times. The prediction of the overall disulphide connectivity pattern is exact in about 51% of the chains. In addition to using profiles in the input to leverage evolutionary information, including true (but not predicted) secondary structure and solvent accessibility information yields small but noticeable improvements. Finally, once the system is trained, predictions can be computed rapidly on a proteomic or protein-engineering scale. The disulphide bridge prediction server (DIpro), software, and datasets are available through www.igb.uci.edu/servers/psss.html. Proteins 2006.

Keywords

This publication has 39 references indexed in Scilit:

On the relationship between deterministic and probabilistic directed Graphical models: From Bayesian networks to recursive neural networks
Neural Networks, 2005
Identification of common molecular subsequences
Published by Elsevier ,2004
Protein homology detection using string alignment kernels
Bioinformatics, 2004
What can Disulfide Bonds Tell Us about Protein Energetics, Function and Folding: Simulations and Bioninformatics Analysis
Journal of Molecular Biology, 2000
The Protein Data Bank
Nucleic Acids Research, 2000
Do aligned sequences share the same fold?
Journal of Molecular Biology, 1997
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Nucleic Acids Research, 1997
Analysis and Classification of Disulphide Connectivity in Proteins
Journal of Molecular Biology, 1994
Selection of representative protein data sets
Protein Science, 1992
Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features
Biopolymers, 1983