Sequence Alignment with Tandem Duplication
- 1 January 1997
- journal article
- research article
- Published by Mary Ann Liebert Inc in Journal of Computational Biology
- Vol. 4 (3) , 351-367
- https://doi.org/10.1089/cmb.1997.4.351
Abstract
Algorithm development for comparing and aligning biological sequences has, until recently, been based on the SI model of mutational events which assumes that modification of sequences proceeds through any of the operations of substitution, insertion or deletion (the latter two collectively termed indels). While this model has worked fairly well, it has long been apparent that other mutational events occur. In this paper, we introduce a new model, the DSI model which includes another common mutational event, tandem duplication. Tandem duplication produces tandem repeats which are common in DNA, making up perhaps 10% of the human genome. They are responsible for some human diseases and may serve a multitude of functions in DNA regulation and evolution. Using the DSI model, we develop new exact and heuristic algorithms for comparing and aligning DNA sequences when they contain tandem repeats.Keywords
This publication has 25 references indexed in Scilit:
- An improved algorithm for matching biological sequencesPublished by Elsevier ,2004
- Minisatellite diversity supports a recent African origin for modern humansNature Genetics, 1996
- Friedreich's Ataxia: Autosomal Recessive Disease Caused by an Intronic GAA Triplet Repeat ExpansionScience, 1996
- A space efficient algorithm for finding the best nonoverlapping alignment scoreTheoretical Computer Science, 1995
- A method for fast database search for allk-nucleotide repeatsNucleic Acids Research, 1994
- A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington's disease chromosomesCell, 1993
- An Unstable Triplet Repeat in a Gene Related to Myotonic Muscular DystrophyScience, 1992
- Genetic variation at five trimeric and tetrameric tandem repeat loci in four human population groupsGenomics, 1992
- A rapidly evolving region in the immunoglobulin heavy chain loci of rat and mouse: postulated role of (dC-dA)n · (dG-dT)n sequencesGene, 1988
- Enhanced gene expression by the poly(dT-dG).poly(dC-dA) sequence.Molecular and Cellular Biology, 1984