Boosting Protein Threading Accuracy
Open Access
- 1 January 2009
- book chapter
- Published by Springer Nature
- Vol. 5541, 31-45
- https://doi.org/10.1007/978-3-642-02008-7_3
Abstract
Protein threading is one of the most successful protein structure prediction methods. Most protein threading methods use a scoring function linearly combining sequence and structure features to measure the quality of a sequence-template alignment so that a dynamic programming algorithm can be used to optimize the scoring function. However, a linear scoring function cannot fully exploit interdependency among features and thus, limits alignment accuracy. This paper presents a nonlinear scoring function for protein threading, which not only can model interactions among different protein features, but also can be efficiently optimized using a dynamic programming algorithm. We achieve this by modeling the threading problem using a probabilistic graphical model Conditional Random Fields (CRF) and training the model using the gradient tree boosting algorithm. The resultant model is a nonlinear scoring function consisting of a collection of regression trees. Each regression tree models a type of nonlinear relationship among sequence and structure features. Experimental results indicate that this new threading model can effectively leverage weak biological signals and improve both alignment accuracy and fold recognition rate greatly.Keywords
This publication has 81 references indexed in Scilit:
- Metabolic costs of bat echolocation in a non-foraging context support a role in communicationFrontiers in Physiology, 2013
- Experimental evidence for group hunting via eavesdropping in echolocating batsProceedings Of The Royal Society B-Biological Sciences, 2009
- Conclusions beyond support: overconfident estimates in mixed modelsBehavioral Ecology, 2008
- Support Vector Training of Protein Alignment ModelsJournal of Computational Biology, 2008
- Discriminative learning for protein conformation samplingProteins-Structure Function and Bioinformatics, 2008
- MUSTER: Improving protein sequence profile–profile alignments by using multiple sources of structure informationProteins-Structure Function and Bioinformatics, 2008
- Homology modeling using parametric alignment ensemble generation with consensus and energy-based model selectionNucleic Acids Research, 2006
- FFAS03: a server for profile-profile sequence alignmentsNucleic Acids Research, 2005
- Single‐body residue‐level knowledge‐based energy score combined with sequence‐profile and secondary structure information for fold recognitionProteins-Structure Function and Bioinformatics, 2004
- Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical featuresBiopolymers, 1983