MaxSubSeq: an algorithm for segment-length optimization. The case study of the transmembrane spanning segments

Abstract
Motivation: A problem in predicting the topography of transmembrane proteins is the optimal localization of the transmembrane segments along the protein sequences, provided that each residue is associated with a propensity of being or not being included in the transmembrane protein region. From previous work it is known that post-processing of propensity signals with suited algorithms can greatly improve the quality and the accuracy of the predictions. In this paper we describe a general dynamic programming-like algorithm (MaxSubSeq, Maximal SubSequence) specifically designed to optimize the number and length of segments with constrained length in a given protein sequence. Previous application of our algorithm, has proved its effectiveness in the optimization task of both neural network and hidden Markov models output, and in this paper we present the detailed description of MaxSubSeq. Results: We describe the application of MaxSubSeq to the location of both helical and beta strand transmembrane segments, optimizing the outputs derived with different predictive algorithms. For all-alpha transmembrane proteins we use both the standard Kyte–Doolittle (KD) hydropathy scale and the TMHMM predictor (http://www.cbs.dtu.dk/). Using a set of 188 well characterized membrane proteins, MaxSubSeq nearly doubles the correct location of transmembrane segments as compared to the standard KD hydrophobicity plot, reaching 51% accuracy. If MaxSubSeq is used to optimize the TMHMM method the accuracy increases from 68 to 72%. When used to regularize the prediction of beta transmembrane strands, obtained using both a neural network and a HMM based predictors, MaxSubSeq increases the accuracy per protein up to 72 and 73% respectively. Availability: The program is available upon request to the authors, or it is accessible through our web server (http://gpcr.biocomp.unibo.it/predictors/) Contact: piero@biocomp.unibo.it casadio@alma.unibo.it

This publication has 0 references indexed in Scilit: