Mining α-Helix-Forming Molecular Recognition Features with Cross Species Sequence Alignments
Top Cited Papers
- 1 November 2007
- journal article
- research article
- Published by American Chemical Society (ACS) in Biochemistry
- Vol. 46 (47) , 13468-13477
- https://doi.org/10.1021/bi7012273
Abstract
Previously described algorithms for mining α-helix-forming molecular recognition elements (MoREs), described by Oldfield et al. (Oldfield, C. J., Cheng, Y., Cortese, M. S., Brown, C. J., Uversky, V. N., and Dunker, A. K. (2005) Comparing and combining predictors of mostly disordered proteins, Biochemistry44, 1989−2000), also known as molecular recognition features (MoRFs) (Mohan, A., Oldfield, C. J., Radivojac, P., Vacic, V., Cortese, M. S., Dunker, A. K., and Uversky, V. N. (2006) Analysis of Molecular Recognition Features (MoRFs), J. Mol. Biol. 362, 1043−1059), revealed that regions undergoing disorder-to-order transition are involved in many molecular recognition events and are crucial for protein−protein interactions. However, these algorithms were developed using a training data set of a limited size. Here we propose to improve the prediction algorithms by (1) including additional α-MoRF examples and their cross species homologues in the positive training set, (2) carefully extracting monomer structure chains from the Protein Data Bank (PDB) as the negative training set, (3) including attributes from recently developed disorder predictors, secondary structure predictions, and amino acid indices, and (4) constructing neural network based predictors and performing validation. Over 50 regions which undergo disorder-to-order transition that were identified in the PDB together with a set of corresponding cross species homologues of each structure-based example were included in a new positive training set. Over 1500 attributes, including disorder predictions, secondary structure predictions, and amino acid indices, were evaluated by the conditional probability method. The top attributes, including VSL2 and VL3 disorder predictions and several physicochemical propensities of amino acid residues, were used to develop the feed forward neural networks. The sensitivity, specificity, and accuracy of the resulting predictor, α-MoRF-PredII, were 0.87 ± 0.10, 0.87 ± 0.11, and 0.87 ± 0.08 over 10 cross validations, respectively. We present the results of these analyses and validation examples to discuss the potential improvement of the α-MoRF-PredII prediction accuracy.Keywords
This publication has 73 references indexed in Scilit:
- Functional Anthology of Intrinsic Disorder. 2. Cellular Components, Domains, Technical Terms, Developmental Processes, and Coding Sequence Diversities Correlated with Long Disordered RegionsJournal of Proteome Research, 2007
- Functional Anthology of Intrinsic Disorder. 1. Biological Processes and Functions of Proteins with Long Disordered RegionsJournal of Proteome Research, 2007
- Protein Intrinsic Disorder and Human Papillomaviruses: Increased Amount of Disorder in E6 and E7 Oncoproteins from High Risk HPVsJournal of Proteome Research, 2006
- Intrinsic Disorder in Transcription FactorsBiochemistry, 2006
- Exploiting heterogeneous sequence properties improves prediction of protein disorderProteins-Structure Function and Bioinformatics, 2005
- Coupled Folding and Binding with α-Helix-Forming Molecular Recognition ElementsBiochemistry, 2005
- Showing your ID: intrinsic disorder as an ID for recognition, regulation and cell signalingJournal of Molecular Recognition, 2005
- Comparing and Combining Predictors of Mostly Disordered ProteinsBiochemistry, 2005
- Predicting intrinsic disorder from amino acid sequenceProteins-Structure Function and Bioinformatics, 2003
- Protein folding revisited. A polypeptide chain at the folding ? misfolding ? nonfolding cross-roads: which way to go?Cellular and Molecular Life Sciences, 2003