On Combinatorial DNA Word Design
- 1 June 2001
- journal article
- research article
- Published by Mary Ann Liebert Inc in Journal of Computational Biology
- Vol. 8 (3) , 201-219
- https://doi.org/10.1089/10665270152530818
Abstract
We consider the problem of designing DNA codes, namely sets of equi-length words over the alphabet {A, C, G, T} that satisfy certain combinatorial constraints. This problem is motivated by the task of reliably storing and retrieving information in synthetic DNA strands for use in DNA computing or as molecular bar codes in chemical libraries. The primary constraints that we consider, defined with respect to a parameter d, are as follows: for every pair of words w, x in a code, there are at least d mismatches between w and x if w ≠ x and also between the reverse of w and the Watson-Crick complement of x. Extending classical results from coding theory, we present several upper and lower bounds on the maximum size of such DNA codes and give methods for constructing such codes. An additional constraint that is relevant to the design of DNA codes is that the free energies and enthalpies of the code words, and thus the melting temperatures, be similar. We describe dynamic programming algorithms that can (a) calculate the total number of words of length n whose free energy value, as approximated by a formula of Breslauer et al. (1986) falls in a given range, and (b) output a random such word. These algorithms are intended for use in heuristic algorithms for constructing DNA codes.Keywords
This publication has 17 references indexed in Scilit:
- Good encodings for DNA-based solutions to combinatorial problemsPublished by American Mathematical Society (AMS) ,1998
- DNA sequences useful for computationPublished by American Mathematical Society (AMS) ,1998
- Demonstration of a word design strategy for DNA computing on surfacesNucleic Acids Research, 1997
- Molecular Computation of Solutions to Combinatorial ProblemsScience, 1994
- Encoded combinatorial chemistry.Proceedings of the National Academy of Sciences, 1992
- Some new constant weight codesIEEE Transactions on Information Theory, 1991
- A new table of constant weight codesIEEE Transactions on Information Theory, 1990
- Predicting DNA duplex stability from the base sequence.Proceedings of the National Academy of Sciences, 1986
- On the construction of comma-free codesIEEE Transactions on Information Theory, 1965
- Comma-Free CodesCanadian Journal of Mathematics, 1958