Frequency of gaps observed in a structurally aligned protein pair database suggests a simple gap penalty function
Open Access
- 1 May 2004
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 32 (9) , 2838-2843
- https://doi.org/10.1093/nar/gkh610
Abstract
Gap penalty is an important component of the scoring scheme that is needed when searching for homologous proteins and for accurate alignment of protein sequences. Most homology search and sequence alignment algorithms employ a heuristic ‘affine gap penalty’ scheme q+r × n, in which q is the penalty for opening a gap, r the penalty for extending it and n the gap length. In order to devise a more rational scoring scheme, we examined the pattern of gaps that occur in a database of structurally aligned protein domain pairs. We find that the logarithm of the frequency of gaps varies linearly with the length of the gap, but with a break at a gap of length 3, and is well approximated by two linear regression lines with R2 values of 1.0 and 0.99. The bilinear behavior is retained when gaps are categorized by secondary structures of the two residues flanking the gap. Similar results were obtained when another, totally independent, structurally aligned protein pair database was used. These results suggest a modification of the affine gap penalty function.Keywords
This publication has 24 references indexed in Scilit:
- Patterns of nucleotide substitution, insertion and deletion in the human genome inferred from pseudogenesNucleic Acids Research, 2003
- Evolution of alternative splicing: deletions, insertions and origin of functional parts of proteins from intron sequencesTrends in Genetics, 2003
- Finding weak similarities between proteins by sequence profile comparisonNucleic Acids Research, 2003
- The directional atomic solvation energy: An atom-based potential for the assignment of protein sequences to known foldsProceedings of the National Academy of Sciences, 2002
- Empirical determination of effective gap penalties for sequence comparisonBioinformatics, 2002
- Within the twilight zone: a sensitive profile-profile comparison tool based on information theoryJournal of Molecular Biology, 2002
- Distribution of indel lengthsProteins-Structure Function and Bioinformatics, 2001
- FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties11Edited by B. HonigJournal of Molecular Biology, 2001
- The ASTRAL compendium for protein structure and sequence analysisNucleic Acids Research, 2000
- Local sequence alignments with monotonic gap penalties.Bioinformatics, 1999