Predicting Membrane Protein Types Using Residue-pair Models Based on Reduced Similarity Dataset

Abstract
An algorithm to predict the membrane protein types based on the multi-residue-pair effect in the Markov model is proposed. For a newly constructed dataset of 835 membrane proteins with very low sequence similarity, the overall prediction accuracy has been achieved as high as 81.1% and 71.7% in the resubstitution and jackknife test, respectively, for a prediction of type I single-pass, type II single-pass, multi-pass membrane proteins, lipid chain-anchored and GPI-anchored membrane proteins. The improvement of about 11% in the jackknife test can be achieved compared with the component-coupled algorithm merely based on the amino acid composition (AAC approach). The improvement is also confirmed on a high similarity dataset and the other extrapolating test. The result implies that designing more incisive analysis tools, one should develop algorithms based on the representative dataset with lower sequence similarity. The present algorithm is useful to expedite the determination of the types and functions of new membrane proteins and may be useful for the systematic analysis of functional genome data in a large scale. The computer program is available on request.