ReAS: Recovery of Ancestral Sequences for Transposable Elements from the Unassembled Reads of a Whole Genome Shotgun
Open Access
- 23 September 2005
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLoS Computational Biology
- Vol. 1 (4) , e43
- https://doi.org/10.1371/journal.pcbi.0010043
Abstract
We describe an algorithm, ReAS, to recover ancestral sequences for transposable elements (TEs) from the unassembled reads of a whole genome shotgun. The main assumptions are that these TEs must exist at high copy numbers across the genome and must not be so old that they are no longer recognizable in comparison to their ancestral sequences. Tested on the japonica rice genome, ReAS was able to reconstruct all of the high copy sequences in the Repbase repository of known TEs, and increase the effectiveness of RepeatMasker in identifying TEs from genome sequences. Transposable elements (TEs) are a major component of the genomes of multicellular organisms. They are parasitic creatures that invade the genome, insert multiple copies of themselves, and then die. All we see now are the decayed remnants of their ancestral sequences. Reconstruction of these ancestral sequences can bring dead TEs back to life. Algorithms for detecting TEs compare present-day sequences to a library of ancestral sequences. Unknown to many, pervasive use of whole genome shotgun (WGS) methods in large-scale sequencing have made TE reconstructions increasingly problematic. To minimize assembly errors, WGS methods must reject the highly repetitive sequences that characterize most TEs, especially the most recent TEs, which are the least diverged from their ancestral sequences (and most informative for reconstruction). This is acceptable to many, because the most important parts of the genes are not repetitive, but for the TE aficionados, it is a problem. ReAS is a novel algorithm that does TE reconstruction using only the unassembled reads of a WGS. Tested against the WGS for japonica rice, it is shown to produce a library that is superior to the manually curated Repbase database of known ancestral TEs.Keywords
This publication has 27 references indexed in Scilit:
- De Novo Repeat Classification and Fragment AssemblyGenome Research, 2004
- Automated De Novo Identification of Repeat Sequence Families in Sequenced GenomesGenome Research, 2002
- Repeats in genomic DNA: mining and meaningPublished by Elsevier ,2002
- REPuter: the manifold applications of repeat analysis on a genomic scaleNucleic Acids Research, 2001
- Evolving Genomic Metaphors: A New Look at the Language of DNAScience, 2001
- Abundance, Distribution, and Transcriptional Activity of Repetitive Elements in the Maize GenomeGenome Research, 2001
- Applications of retrotransposons as genetic tools in plant biologyPublished by Elsevier ,2001
- Initial sequencing and analysis of the human genomeNature, 2001
- Retrotransposon BARE-1 and its role in genome evolution in the genus HordeumPlant Cell, 1999
- Transposable elements as sources of variation in animals and plantsProceedings of the National Academy of Sciences, 1997