Improved sequence-based prediction of disordered regions with multilayer fusion of multiple information sources
Open Access
- 4 September 2010
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 26 (18) , i489-i496
- https://doi.org/10.1093/bioinformatics/btq373
Abstract
Motivation: Intrinsically disordered proteins play a crucial role in numerous regulatory processes. Their abundance and ubiquity combined with a relatively low quantity of their annotations motivate research toward the development of computational models that predict disordered regions from protein sequences. Although the prediction quality of these methods continues to rise, novel and improved predictors are urgently needed. Results: We propose a novel method, named MFDp (Multilayered Fusion-based Disorder predictor), that aims to improve over the current disorder predictors. MFDp is as an ensemble of 3 Support Vector Machines specialized for the prediction of short, long and generic disordered regions. It combines three complementary disorder predictors, sequence, sequence profiles, predicted secondary structure, solvent accessibility, backbone dihedral torsion angles, residue flexibility and B-factors. Our method utilizes a custom-designed set of features that are based on raw predictions and aggregated raw values and recognizes various types of disorder. The MFDp is compared at the residue level on two datasets against eight recent disorder predictors and top-performing methods from the most recent CASP8 experiment. In spite of using training chains with ≤25% similarity to the test sequences, our method consistently and significantly outperforms the other methods based on the MCC index. The MFDp outperforms modern disorder predictors for the binary disorder assignment and provides competitive real-valued predictions. The MFDp's outputs are also shown to outperform the other methods in the identification of proteins with long disordered regions. Availability:http://biomine.ece.ualberta.ca/MFDp.html Supplementary information: Supplementary data are available at Bioinformatics online. Contact:lkurgan@ece.ualberta.caKeywords
This publication has 52 references indexed in Scilit:
- Structural genomics target selection for the New York consortium on membrane protein structureJournal of Structural and Functional Genomics, 2009
- Improving the prediction accuracy of residue solvent accessibility and real‐value backbone torsion angles of proteins by guided‐learning through a two‐layer neural networkProteins-Structure Function and Bioinformatics, 2008
- OnD-CRF: predicting order and disorder in proteins conditional random fieldsBioinformatics, 2008
- Prediction of disordered regions in proteins based on the meta approachBioinformatics, 2008
- MUSTER: Improving protein sequence profile–profile alignments by using multiple sources of structure informationProteins-Structure Function and Bioinformatics, 2008
- Ascorbate acts as a highly potent inducer of chromate mutagenesis and clastogenesis: linkage to DNA breaks in G2 phase by mismatch repairNucleic Acids Research, 2006
- Exploiting heterogeneous sequence properties improves prediction of protein disorderProteins-Structure Function and Bioinformatics, 2005
- IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy contentBioinformatics, 2005
- Intrinsically unstructured proteins and their functionsNature Reviews Molecular Cell Biology, 2005
- Comparing and Combining Predictors of Mostly Disordered ProteinsBiochemistry, 2005