BayesCall: A model-based base-calling algorithm for high-throughput short-read sequencing
- 6 August 2009
- journal article
- research article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 19 (10) , 1884-1895
- https://doi.org/10.1101/gr.095299.109
Abstract
Extracting sequence information from raw images of fluorescence is the foundation underlying several high-throughput sequencing platforms. Some of the main challenges associated with this technology include reducing the error rate, assigning accurate base-specific quality scores, and reducing the cost of sequencing by increasing the throughput per run. To demonstrate how computational advancement can help to meet these challenges, a novel model-based base-calling algorithm, BayesCall, is introduced for the Illumina sequencing platform. Being founded on the tools of statistical learning, BayesCall is flexible enough to incorporate various features of the sequencing process. In particular, it can easily incorporate time-dependent parameters and model residual effects. This new approach significantly improves the accuracy over Illumina's base-caller Bustard, particularly in the later cycles of a sequencing run. For 76-cycle data on a standard viral sample, phiX174, BayesCall improves Bustard's average per-base error rate by ∼51%. The probability of observing each base can be readily computed in BayesCall, and this probability can be transformed into a useful base-specific quality score with a high discrimination ability. A detailed study of BayesCall's performance is presented here.This publication has 10 references indexed in Scilit:
- Probabilistic base calling of Solexa sequencing dataBMC Bioinformatics, 2008
- Mapping short DNA sequencing reads and calling variants using mapping quality scoresGenome Research, 2008
- Alta-Cyclic: a self-optimizing base caller for next-generation sequencingNature Methods, 2008
- Quality scores and SNP detection in sequencing-by-synthesis systemsGenome Research, 2008
- Whole-genome re-sequencingPublished by Elsevier ,2006
- Emerging technologies in DNA sequencingGenome Research, 2005
- An estimate of the crosstalk matrix in four-dye fluorescence-based DNA sequencingElectrophoresis, 1999
- Base-Calling of Automated Sequencer Traces UsingPhred. I. Accuracy AssessmentGenome Research, 1998
- Base-Calling of Automated Sequencer Traces Using Phred. II. Error ProbabilitiesGenome Research, 1998
- Automatic matrix determination in four dye fluorescence-based DNA sequencingElectrophoresis, 1996