A Quantitative Comparison of DNA Sequence Assembly Programs
- 1 January 1994
- journal article
- research article
- Published by Mary Ann Liebert Inc in Journal of Computational Biology
- Vol. 1 (4) , 257-269
- https://doi.org/10.1089/cmb.1994.1.257
Abstract
We have compared 11 sequence assembly programs for the accuracy and reproducibility with which they assemble DNA fragments into a completed sequence. To test the assemblers under controlled conditions, the rat multidrug resistance (RATMDRM) gene sequence was randomly divided into overlapping 200- to 400-base fragments. Various degrees of error, in the form of miss-identified bases, missed bases, and duplicated bases, were randomly added to these fragments. The probability of an error, and the type of error, was modified using an error distribution template that was developed by comparing the original fragments used to sequence RATMDRM with the final, edited sequence stored in GenBank. From 0 to 15% error was then added to independent sets of fragments, and assemblage was attempted. The quality of the assemblages was evaluated by comparing the number of differences between the assembled sequence and the original sequence. Tests were also done to determine if the order in which fragments were added to a project affected the final sequence and if the quality of assemblage was sequence dependent. Similar results were also obtained using other, unrelated sequences. The programs could be roughly divided into three groups based on the accuracy and reproducibility of assembly. Three (GCG, FAB, and AutoAssembler) consistently produced consensus sequences of low error and high reproducibility. Intermediate results were obtained with five other programs (Sequencher, AssemblyLIGN, XBAP, SeqMan, and AutoAssembler in a mode that made use of an external special processor). Less satisfactory results were obtained with the remaining three programs (GeneWorks, GENeration, and PC/Gene). The ability of the programs to edit the assembled sequence was also compared. Five of the programs were able to display and edit automatic sequencer trace files. The Sequencher program had a particularly well-designed sequence editor that allowed rapid examination and correction of assembly errors.Keywords
This publication has 7 references indexed in Scilit:
- [27] Dynamic programming algorithms for biological sequence comparisonPublished by Elsevier ,1992
- The nuclear matrix: A heuristic model for investigating genomic organization and function in the cell nucleusJournal of Cellular Biochemistry, 1991
- Cloning and characterization of a member of the rat multidrug resistance (mdr) gene familyGene, 1991
- A review of algorithms for molecular sequence comparisonComputers and Biomedical Research, 1991
- Neighboring nucleotide interactions during DNA sequencing gel electrophoresisNucleic Acids Research, 1991
- The Origin and Evolution of RetroposonsInternational Review of Cytology, 1985
- A mew computer method for the storage and manipulation of DNA gel reading dataNucleic Acids Research, 1980