Prediction of In-Vivo Modification Sites of Proteins from Their Primary Structures

Abstract
In order to make better use of the information contained in rapidly expanding amino acid sequence data, a new method to predict various modification sites of proteins from their primary structures is presented. It is also applicable to the prediction of other functional sites in proteins. Here we show the examples of N-glycosylation and serine/threonine phosphorylation sites. The method is essentially an elaboration of consensus sequence pattern matching based on stepwise discriminant analysis. The occurring amino acids near a potential modification site are represented by six numerical values which reflect various properties of amino acids. Longer-range effects around these sites are also considered. The stepwise procedure enabled us to automatically select effective features for discrimination. A computer program with our method first identifies potential modification sites by a sequence pattern, NX(S/T) for N-glycosylation or (S/T) for phosphorylation, and then decides by discriminant analysis whether a potential site is likely to be a true modification site. The prediction accuracy in the second step of discrimination was about 60% for glycosylation sites and about 80% for phosphorylation sites.

This publication has 5 references indexed in Scilit: