A compression algorithm for DNA sequences
- 1 January 2001
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Engineering in Medicine and Biology Magazine
- Vol. 20 (4) , 61-66
- https://doi.org/10.1109/51.940049
Abstract
We present a DNA compression algorithm, GenCompress, based on approximate matching that gives the best compression results on standard benchmark DNA sequences. We present the design rationale of GenCompress based on approximate matching, discuss details of the algorithm, provide experimental results, and compare the results with the two most effective compression algorithms for DNA sequences (Biocompress-2 and Cfact).Keywords
This publication has 16 references indexed in Scilit:
- A guaranteed compression scheme for repetitive DNA sequencesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Significantly lower entropy estimates for natural DNA sequencesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- A new challenge for compression algorithms: Genetic sequencesPublished by Elsevier ,2002
- An information-based sequence distance and its application to whole mitochondrial genome phylogenyBioinformatics, 2001
- Sequence complexity for biological sequence analysisComputers & Chemistry, 2000
- Algorithms on Strings, Trees and SequencesPublished by Cambridge University Press (CUP) ,1997
- Detection of significant patterns by compression algorithms: the case of approximate tandem repeats in DNA sequencesBioinformatics, 1997
- Universal Data Compression Algorithm Based on Approximate String MatchingProbability in the Engineering and Informational Sciences, 1996
- Discovery by minimal length encoding: A case study in molecular evolutionMachine Learning, 1993
- A universal algorithm for sequential data compressionIEEE Transactions on Information Theory, 1977