Optimization of the Sliding Window Size for Protein Structure Prediction

Abstract
Sliding window based methods are relatively often applied in prediction of various aspects related to protein structure. Despite their wide spread use, researchers did not establish a standard related to the size of the window, i.e., window sizes ranging between 7 and 17 residues were used in the past. To this end, this paper performs a computational study based on a probabilistic approach that aims at finding an optimal sliding window size. The results shows that formation of helical structure can be affected by amino acids (AAs) that are up to 9 positions away in the sequence, while the formation of coils and strands can be affected by AAs that are up to 3 and 6 positions away, respectively. Overall, our results suggest that a sliding window with 19 residues is optimal for secondary structure prediction, while for a specific prediction tasks, such as prediction of p-strands, a smaller window size is sufficient. Finally, the 20 AAs are categorized into five groups based on their influence of formation of the secondary structure. The finding related to the optimal window size was confirmed based on an independent experimental study related to the prediction of secondary protein structure