Estimation of Errors in “Raw” DNA Sequences: A Validation Study
Open Access
- 1 March 1998
- journal article
- research article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 8 (3) , 251-259
- https://doi.org/10.1101/gr.8.3.251
Abstract
As DNA sequencing is performed more and more in a mass-production-like manner, efficient quality control measures become increasingly important for process control, but so also does the ability to compare different methods and projects. One of the fundamental quality measures in sequencing projects is the position-specific error probability at all bases in each individual sequence. Accurate prediction of base-specific error rates from “raw” sequence data would allow immediate quality control as well as benchmarking different methods and projects while avoiding the inefficiencies and time delays associated with resequencing and assessments after “finishing” a sequence. The program PHRED provides base-specific quality scores that are logarythmically related to error probabilities. This study assessed the accuracy of PHRED’s error-rate prediction by analyzing sequencing projects from six different large-scale sequencing laboratories. All projects used four-color fluorescent sequencing, but the sequencing methods used varied widely between the different projects. The results indicate that the error-rate predictions such as those given by PHRED can be highly accurate for a large variety of different sequencing methods as well as over a wide range of sequence quality.Keywords
This publication has 6 references indexed in Scilit:
- Base-Calling of Automated Sequencer Traces UsingPhred. I. Accuracy AssessmentGenome Research, 1998
- Base-Calling of Automated Sequencer Traces Using Phred. II. Error ProbabilitiesGenome Research, 1998
- A graph theoretic approach to the analysis of DNA sequencing data.Genome Research, 1996
- The application of numerical estimates of base calling accuracy to DNA sequencing projectsNucleic Acids Research, 1995
- Assignment of position-specific error probability to primary DNA sequence dataNucleic Acids Research, 1994
- The accuracy of DNA sequences: Estimating sequence qualityGenomics, 1992