Distribution of indel lengths
- 8 August 2001
- journal article
- research article
- Published by Wiley in Proteins-Structure Function and Bioinformatics
- Vol. 45 (1) , 102-104
- https://doi.org/10.1002/prot.1129
Abstract
Protein sequence alignment has become a widely used method in the study of newly sequenced proteins. Most sequence alignment methods use an affine gap penalty to assign scores to insertions and deletions. Although affine gap penalties represent the relative ease of extending a gap compared with initializing a gap, it is still an obvious oversimplification of the real processes that occur during sequence evolution. To improve the efficiency of sequence alignment methods and to obtain a better understanding of the process of sequence evolution, we wanted to find a more accurate model of insertions and deletions in homologous proteins. In this work, we extract the probability of a gap occurrence and the resulting gap length distribution in distantly related proteins (sequence identity < 25%) using alignments based on their common structures. We observe a distribution of gaps that can be fitted with a multiexponential with four distinct components. The results suggest new approaches to modeling insertions and deletions in sequence alignments. Proteins 2001;45:102–104.Keywords
This publication has 11 references indexed in Scilit:
- Molecular diversity of arbuscular mycorrhizal fungi colonising arable cropsFEMS Microbiology Ecology, 2001
- Optimization of a new score function for the detection of remote homologsProteins-Structure Function and Bioinformatics, 2000
- A systematic comparison of protein structure classifications: SCOP, CATH and FSSPStructure, 1999
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Mapping the Protein UniverseScience, 1996
- [27] Local alignment statisticsPublished by Elsevier ,1996
- Exhaustive Matching of the Entire Protein Sequence DatabaseScience, 1992
- Basic local alignment search toolJournal of Molecular Biology, 1990
- Improved tools for biological sequence comparison.Proceedings of the National Academy of Sciences, 1988
- Rapid and Sensitive Protein Similarity SearchesScience, 1985