Linear programming optimization and a double statistical filter for protein threading protocols
- 21 September 2001
- journal article
- research article
- Published by Wiley in Proteins-Structure Function and Bioinformatics
- Vol. 45 (3) , 241-261
- https://doi.org/10.1002/prot.1145
Abstract
The design of scoring functions (or potentials) for threading, differentiating native‐like from non‐native structures with a limited computational cost, is an active field of research. We revisit two widely used families of threading potentials: the pairwise and profile models. To design optimal scoring functions we use linear programming (LP). The LP protocol makes it possible to measure the difficulty of a particular training set in conjunction with a specific form of the scoring function. Gapless threading demonstrates that pair potentials have larger prediction capacity compared with profile energies. However, alignments with gaps are easier to compute with profile potentials. We therefore search and propose a new profile model with comparable prediction capacity to contact potentials. A protocol to determine optimal energy parameters for gaps, using LP, is also presented. A statistical test, based on a combination of local and global Z‐scores, is employed to filter out false‐positives. Extensive tests of the new protocol are presented. The new model provides an efficient alternative for threading with pair energies, maintaining comparable accuracy. The code, databases, and a prediction server are available at http://www.tc.cornell.edu/CBIO/loopp. Proteins 2001;45:241–261.Keywords
This publication has 42 references indexed in Scilit:
- Ab initio construction of protein tertiary structures using a hierarchical approachJournal of Molecular Biology, 2000
- Pair potentials for protein folding: Choice of reference states and sensitivity of predicted native states to variations in the interaction schemesProtein Science, 1999
- Empirical statistical estimates for sequence similarity searchesJournal of Molecular Biology, 1998
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Residue – Residue Potentials with a Favorable Contact Pair Term and an Unfavorable High Packing Density Term, for Simulation and ThreadingJournal of Molecular Biology, 1996
- Are proteins ideal mixtures of amino acids? Analysis of energy parameter setsProtein Science, 1995
- Prediction of Protein Structure by Evaluation of Sequence-structure Fitness: Aligning Sequences to Contact Profiles Derived from Three-dimensional StructuresJournal of Molecular Biology, 1993
- Topology fingerprint approach to the inverse protein folding problemJournal of Molecular Biology, 1992
- A new approach to protein fold recognitionNature, 1992
- Random sequencesJournal of Molecular Biology, 1983