Variable gap penalty for protein sequence–structure alignment
Open Access
- 19 January 2006
- journal article
- research article
- Published by Oxford University Press (OUP) in Protein Engineering, Design and Selection
- Vol. 19 (3) , 129-133
- https://doi.org/10.1093/protein/gzj005
Abstract
The penalty for inserting gaps into an alignment between two protein sequences is a major determinant of the alignment accuracy. Here, we present an algorithm for finding a globally optimal alignment by dynamic programming that can use a variable gap penalty (VGP) function of any form. We also describe a specific function that depends on the structural context of an insertion or deletion. It penalizes gaps that are introduced within regions of regular secondary structure, buried regions, straight segments and also between two spatially distant residues. The parameters of the penalty function were optimized on a set of 240 sequence pairs of known structure, spanning the sequence identity range of 20–40%. We then tested the algorithm on another set of 238 sequence pairs of known structures. The use of the VGP function increases the number of correctly aligned residues from 81.0 to 84.5% in comparison with the optimized affine gap penalty function; this difference is statistically significant according to Student's t-test. We estimate that the new algorithm allows us to produce comparative models with an additional ∼7 million accurately modeled residues in the ∼1.1 million proteins that are detectably related to a known structure.Keywords
This publication has 25 references indexed in Scilit:
- A decade of CASP: progress, bottlenecks and prognosis in protein structure predictionCurrent Opinion in Structural Biology, 2005
- Identification of common molecular subsequencesPublished by Elsevier ,2004
- Alignment of protein sequences by their profilesProtein Science, 2004
- A comparison of scoring functions for protein sequence profile alignmentBioinformatics, 2004
- Within the twilight zone: a sensitive profile-profile comparison tool based on information theoryJournal of Molecular Biology, 2002
- FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties11Edited by B. HonigJournal of Molecular Biology, 2001
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Comparative Protein Modelling by Satisfaction of Spatial RestraintsJournal of Molecular Biology, 1993
- Evaluation and improvements in the automatic alignment of protein sequencesProtein Engineering, Design and Selection, 1987
- A general method applicable to the search for similarities in the amino acid sequence of two proteinsJournal of Molecular Biology, 1970