Identifying disordered regions in proteins from amino acid sequence

Abstract
A rule-based and several neural network predictors are developed for identifying disordered regions in proteins. The rule-based predictor, which relied on the observation that disordered regions contain few aromatic amino acids, was suitable only for very long disordered regions, whereas the neural network predictors were developed separately for short-, medium-, and long-disordered regions (LDRs), The out-of-sample prediction accuracies on a residue-by-residue basis ranged from 69 to 74% for the neural network predictors when applied to the same length class, but fell to 59 to 67% when applied to different length classes. Testing the rule-based predictor on a residue-by-residue basis using out-of-sample LDRs gave a success rate of 70%. Application of both the rule-based and LDR neural network predictors to large databases of protein sequences provide strong evidence that disordered regions are very common in nature. These results are consistent with our recent proposal that disordered regions are crucial for the evolution of molecular recognition.