An analysis of protein folding type prediction by seed-propagated sampling and jackknife test

Abstract
In the development of methodology for statistical prediction of protein folding types, how to test the predicted results is a crucial problem. In addition to the resubstitution test in which the folding type of each protein from a training set is predicted based on the rules derived from the same set, cross-validation tests are needed. Among them, the single-testset method seems to be least reliable due to the arbitrariness in selecting the test set. Although the leaving-one-out (or jackknife) test is more objective and hence more reliable, it may cause a severe information loss by leaving a protein in turn out of the training set when its size is not large enough. In order to overcome the above drawback, a seed-propagated sampling approach is proposed that can be used to generate any number of simulated proteins with a desired type based on a given training set database. There is no need to make any predetermined assumption about the statistical distribution function of the amino acid frequencies. Combined with the existing cross-validation methods, the new technique may provide a more objective estimation for various protein-folding-type prediction methods.