A graph theoretic approach to the analysis of DNA sequencing data.
- 1 February 1996
- journal article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 6 (2) , 80-91
- https://doi.org/10.1101/gr.6.2.80
Abstract
The analysis of data from automated DNA sequencing instruments has been a limiting factor in the development of new sequencing technology. A new base-calling algorithm that is intended to be independent of any particular sequencing technology has been developed and shown to be effective with data from the Applied Biosystems 373 sequencing system. This algorithm makes use of a nonlinear deconvolution filter to detect likely oligomer events and a graph theoretic editing strategy to find the subset of those events that is most likely to correspond to the correct sequence. Metrics evaluating the quality and accuracy of the resulting sequence are also generated and have been shown to be predictive of measured error rates. Compared to the Applied Biosystems Analysis software, this algorithm generates 18% fewer insertion errors, 80% more deletion errors, and 4% fewer mismatches. The tradeoff between different types of errors can be controlled through a secondary editing step that inserts or deletes base calls depending on their associated confidence values.Keywords
This publication has 5 references indexed in Scilit:
- Neural Networks for Automated Base-calling of Gel-based DNA Sequencing LaddersPublished by Elsevier ,1994
- An adaptive, object oriented strategy for base calling in DNA sequence analysisNucleic Acids Research, 1993
- Probabilistic Neural NetworksPublished by Elsevier ,1993
- Large-Scale and Automated DNA Sequence DeterminationScience, 1991
- A general method applicable to the search for similarities in the amino acid sequence of two proteinsJournal of Molecular Biology, 1970