Prediction of integral membrane protein type by collocated hydrophobic amino acid pairs

19 June 2008

journal article
research article
Published by Wiley in Journal of Computational Chemistry

Vol. 30 (1) , 163-172
https://doi.org/10.1002/jcc.21053

Abstract

A computational model, IMP‐TYPE, is proposed for the classification of five types of integral membrane proteins from protein sequence. The proposed model aims not only at providing accurate predictions but most importantly it incorporates interesting and transparent biological patterns. When contrasted with the best‐performing existing models, IMP‐TYPE reduces the error rates of these methods by 19 and 34% for two out‐of‐sample tests performed on benchmark datasets. Our empirical evaluations also show that the proposed method provides even bigger improvements, i.e., 29 and 45% error rate reductions, when predictions are performed for sequences that share low (40%) identity with sequences from the training dataset. We also show that IMP‐TYPE can be used in a standalone mode, i.e., it duplicates significant majority of correct predictions provided by other leading methods, while providing additional correct predictions which are incorrectly classified by the other methods. Our method computes predictions using a Support Vector Machine classifier that takes feature‐based encoded sequence as its input. The input feature set includes hydrophobic AA pairs, which were selected by utilizing a consensus of three feature selection algorithms. The hydrophobic residues that build up the AA pairs used by our method are shown to be associated with the formation of transmembrane helices in a few recent studies concerning integral membrane proteins. Our study also indicates that Met and Phe display a certain degree of hydrophobicity, which may be more crucial than their polarity or aromaticity when they occur in the transmembrane segments. This conclusion is supported by a recent study on potential of mean force for membrane protein folding and a study of scales for membrane propensity of amino acids. © 2008 Wiley Periodicals, Inc. J Comput Chem, 2009

Keywords

This publication has 51 references indexed in Scilit:

Structure and mechanism of the M2 proton channel of influenza A virus
Nature, 2008
Prediction of flexible/rigid regions from protein sequences using k-spaced amino acid pairs
BMC Structural Biology, 2007
The Structure of the ζζ Transmembrane Dimer Reveals Features Essential for Its Assembly with the T Cell Receptor
Cell, 2006
A knowledge‐based scale for amino acid membrane propensity
Proteins-Structure Function and Bioinformatics, 2002
Prediction of protein cellular attributes using pseudo‐amino acid composition
Proteins-Structure Function and Bioinformatics, 2001
Statistical analysis of amino acid patterns in transmembrane helices: the GxxxG motif occurs frequently and in association with β-branched residues at neighboring positions
Journal of Molecular Biology, 2000
Protein secondary structure prediction based on position-specific scoring matrices 1 1Edited by G. Von Heijne
Journal of Molecular Biology, 1999
Genome‐wide analysis of integral membrane proteins from eubacterial, archaean, and eukaryotic organisms
Protein Science, 1998
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Nucleic Acids Research, 1997
Non-random Distribution of Amino Acids in the Transmembrane Segments of Human Type I Single Span Membrane Proteins
Journal of Molecular Biology, 1993