Two‐stage support vector regression approach for predicting accessible surface areas of amino acids

Abstract
We address the problem of predicting solvent accessible surface area (ASA) of amino acid residues in protein sequences, without classifying them into buried and exposed types. A two‐stage support vector regression (SVR) approach is proposed to predict real values of ASA from the position‐specific scoring matrices generated from PSI‐BLAST profiles. By adding SVR as the second stage to capture the influences on the ASA value of a residue by those of its neighbors, the two‐stage SVR approach achieves improvements of mean absolute errors up to 3.3%, and correlation coefficients of 0.66, 0.68, and 0.67 on the Manesh dataset of 215 proteins, the Barton dataset of 502 nonhomologous proteins, and the Carugo dataset of 338 proteins, respectively, which are better than the scores published earlier on these datasets. A Web server for protein ASA prediction by using a two‐stage SVR method has been developed and is available ( http://birc.ntu.edu.sg/∼pas0186457/asa.html). Proteins 2006.

This publication has 38 references indexed in Scilit: