SIMPROT: Using an empirically determined indel distribution in simulations of protein evolution
Open Access
- 27 September 2005
- journal article
- software
- Published by Springer Nature in BMC Bioinformatics
- Vol. 6 (1) , 1-7
- https://doi.org/10.1186/1471-2105-6-236
Abstract
Background: General protein evolution models help determine the baseline expectations for the evolution of sequences, and they have been extensively useful in sequence analysis and for the computer simulation of artificial sequence data sets. Results: We have developed a new method of simulating protein sequence evolution, including insertion and deletion (indel) events in addition to amino-acid substitutions. The simulation generates both the simulated sequence family and a true sequence alignment that captures the evolutionary relationships between amino acids from different sequences. Our statistical model for indel evolution is based on the empirical indel distribution determined by Qian and Goldstein. We have parameterized this distribution so that it applies to sequences diverged by varying evolutionary times and generalized it to provide flexibility in simulation conditions. Our method uses a Monte-Carlo simulation strategy, and has been implemented in a C++ program named Simprot. Conclusion: Simprot will be useful for testing methods of analysis of protein sequence families particularly alignment methods, phylogenetic tree building, detection of recombination and horizontal gene transfer, and homology detection, where knowing the true course of sequence evolution is essential.Keywords
This publication has 16 references indexed in Scilit:
- Indel-Based Evolutionary Distance and Mouse–Human DivergenceGenome Research, 2004
- Empirical Analysis of Protein Insertions and Deletions Determining Parameters for the Correct Placement of Gaps in Protein Sequence AlignmentsJournal of Molecular Biology, 2004
- High mutation rate and predominance of insertions in the Caenorhabditis elegans nuclear genomeNature, 2004
- Context of deletions and insertions in human coding sequencesHuman Mutation, 2004
- A "Long Indel" Model For Evolutionary Sequence AlignmentMolecular Biology and Evolution, 2003
- PSeq-Gen: an application for the Monte Carlo simulation of protein sequence evolution along phylogenetic treesBioinformatics, 1997
- Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic treesBioinformatics, 1997
- Empirical and Structural Models for Insertions and Deletions in the Divergent Evolution of ProteinsJournal of Molecular Biology, 1993
- The rapid generation of mutation data matrices from protein sequencesBioinformatics, 1992
- Inching toward reality: An improved likelihood model of sequence evolutionJournal of Molecular Evolution, 1992