Estimating the Repeat Structure and Length of DNA Sequences Using ℓ-Tuples
- 5 August 2003
- journal article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 13 (8) , 1916-1922
- https://doi.org/10.1101/gr.1251803
Abstract
In shotgun sequencing projects, the genome or BAC length is not always known. We approach estimating genome length by first estimating the repeat structure of the genome or BAC, sometimes of interest in its own right, on the basis of a set of random reads from a genome project. Moreover, we can find the consensus for repeat families before assembly. Our methods are based on the ℓ-tuple content of the reads.Keywords
This publication has 6 references indexed in Scilit:
- RePS: A Sequence Assembler That Masks Exact Repeats Identified from the Shotgun DataGenome Research, 2002
- REPuter: the manifold applications of repeat analysis on a genomic scaleNucleic Acids Research, 2001
- An Eulerian path approach to DNA fragment assemblyProceedings of the National Academy of Sciences, 2001
- Whole-Genome Random Sequencing and Assembly of Haemophilus influenzae RdScience, 1995
- A New Algorithm for DNA Sequence AssemblyJournal of Computational Biology, 1995
- Genomic mapping by fingerprinting random clones: A mathematical analysisGenomics, 1988