Transmembrane Topology and Signal Peptide Prediction Using Dynamic Bayesian Networks

Open Access

7 November 2008

journal article
research article
Published by Public Library of Science (PLoS) in PLoS Computational Biology

Vol. 4 (11) , e1000213
https://doi.org/10.1371/journal.pcbi.1000213

Abstract

Hidden Markov models (HMMs) have been successfully applied to the tasks of transmembrane protein topology prediction and signal peptide prediction. In this paper we expand upon this work by making use of the more powerful class of dynamic Bayesian networks (DBNs). Our model, Philius, is inspired by a previously published HMM, Phobius, and combines a signal peptide submodel with a transmembrane submodel. We introduce a two-stage DBN decoder that combines the power of posterior decoding with the grammar constraints of Viterbi-style decoding. Philius also provides protein type, segment, and topology confidence metrics to aid in the interpretation of the predictions. We report a relative improvement of 13% over Phobius in full-topology prediction accuracy on transmembrane proteins, and a sensitivity and specificity of 0.96 in detecting signal peptides. We also show that our confidence metrics correlate well with the observed precision. In addition, we have made predictions on all 6.3 million proteins in the Yeast Resource Center (YRC) database. This large-scale study provides an overall picture of the relative numbers of proteins that include a signal-peptide and/or one or more transmembrane segments as well as a valuable resource for the scientific community. All DBNs are implemented using the Graphical Models Toolkit. Source code for the models described here is available at http://noble.gs.washington.edu/proj/philius. A Philius Web server is available at http://www.yeastrc.org/philius, and the predictions on the YRC database are available at http://www.yeastrc.org/pdr. Transmembrane proteins control the flow of information and substances into and out of the cell and are involved in a broad range of biological processes. Their interfacing role makes them rewarding drug targets, and it is estimated that more than 50% of recently launched drugs target membrane proteins. However, experimentally determining the three-dimensional structure of a transmembrane protein is still a difficult task, and few of the currently known tertiary structures are of transmembrane proteins despite the fact that as many as one quarter of the proteins in a given organism are transmembrane proteins. Computational methods for predicting the basic topology of a transmembrane protein are therefore of great interest, and these methods must be able to distinguish between mature, membrane-spanning proteins and proteins that, when first synthesized, contain an N-terminal membrane-spanning signal peptide. In this work, we present Philius, a new computational approach that outperforms previous methods in simultaneously detecting signal peptides and correctly predicting the topology of transmembrane proteins. Philius also supplies a set of confidence scores with each prediction. A Philius Web server is available to the public as well as precomputed predictions for over six million proteins in the Yeast Resource Center database.

Keywords

This publication has 42 references indexed in Scilit:

MemBrain: Improving the Accuracy of Predicting Transmembrane Helices
PLOS ONE, 2008
Prediction of membrane-protein topology from first principles
Proceedings of the National Academy of Sciences, 2008
PROTEUS2: a web server for comprehensive protein structure prediction and structure-based annotation
Nucleic Acids Research, 2008
A dynamic Bayesian network approach to protein secondary structure prediction
BMC Bioinformatics, 2008
A Primer on Learning in Bayesian Networks for Computational Biology
PLoS Computational Biology, 2007
Advantages of combined transmembrane topology and signal peptide prediction--the Phobius web server
Nucleic Acids Research, 2007
A global topology map of the Saccharomyces cerevisiae membrane proteome
Proceedings of the National Academy of Sciences, 2006
PONGO: a web server for multiple predictions of all-alpha transmembrane proteins
Nucleic Acids Research, 2006
Predicting transmembrane protein topology with a hidden markov model: application to complete genomes11Edited by F. Cohen
Journal of Molecular Biology, 2001
Prediction of complete gene structures in human genomic DNA
Journal of Molecular Biology, 1997