Tracembler – software for in-silico chromosome walking in unassembled genomes
Open Access
- 9 May 2007
- journal article
- research article
- Published by Springer Nature in BMC Bioinformatics
- Vol. 8 (1) , 151
- https://doi.org/10.1186/1471-2105-8-151
Abstract
Whole genome shotgun sequencing produces increasingly higher coverage of a genome with random sequence reads. Progressive whole genome assembly and eventual finishing sequencing is a process that typically takes several years for large eukaryotic genomes. In the interim, all sequence reads of public sequencing projects are made available in repositories such as the NCBI Trace Archive. For a particular locus, sequencing coverage may be high enough early on to produce a reliable local genome assembly. We have developed software, Tracembler, that facilitates in silico chromosome walking by recursively assembling reads of a selected species from the NCBI Trace Archive starting with reads that significantly match sequence seeds supplied by the user. Tracembler takes one or multiple DNA or protein sequence(s) as input to the NCBI Trace Archive BLAST engine to identify matching sequence reads from a species of interest. The BLAST searches are carried out recursively such that BLAST matching sequences identified in previous rounds of searches are used as new queries in subsequent rounds of BLAST searches. The recursive BLAST search stops when either no more new matching sequences are found, a given maximal number of queries is exhausted, or a specified maximum number of rounds of recursion is reached. All the BLAST matching sequences are then assembled into contigs based on significant sequence overlaps using the CAP3 program. We demonstrate the validity of the concept and software implementation with an example of successfully recovering a full-length Chrm2 gene as well as its upstream and downstream genomic regions from Rattus norvegicus reads. In a second example, a query with two adjacent Medicago truncatula genes as seeds resulted in a contig that likely identifies the microsyntenic homologous soybean locus. Tracembler streamlines the process of recursive database searches, sequence assembly, and gene identification in resulting contigs in attempts to identify homologous loci of genes of interest in species with emerging whole genome shotgun reads. A web server hosting Tracembler is provided at http://www.plantgdb.org/tool/tracembler/, and the software is also freely available from the authors for local installations.Keywords
This publication has 9 references indexed in Scilit:
- Engineering a software tool for gene structure prediction in higher organismsInformation and Software Technology, 2005
- Highly syntenic regions in the genomes of soybean, Medicago truncatula, and Arabidopsis thalianaBMC Plant Biology, 2005
- Sequencing the Genespaces of Medicago truncatula and Lotus japonicusPlant Physiology, 2005
- BLAST: at the core of a powerful and diverse set of sequence analysis toolsNucleic Acids Research, 2004
- Genome sequence of the Brown Norway rat yields insights into mammalian evolutionNature, 2004
- GENOTRACE: cDNA-based local GENOme assembly from TRACE archivesBioinformatics, 2002
- SSAHA: A Fast Search Method for Large DNA DatabasesGenome Research, 2001
- CAP3: A DNA Sequence Assembly ProgramGenome Research, 1999
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997