Iterative Correction of Reference Nucleotides (iCORN) using second generation sequencing technology
Top Cited Papers
Open Access
- 18 June 2010
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 26 (14) , 1704-1707
- https://doi.org/10.1093/bioinformatics/btq269
Abstract
Motivation: The accuracy of reference genomes is important for downstream analysis but a low error rate requires expensive manual interrogation of the sequence. Here, we describe a novel algorithm (Iterative Correction of Reference Nucleotides) that iteratively aligns deep coverage of short sequencing reads to correct errors in reference genome sequences and evaluate their accuracy. Results: Using Plasmodium falciparum (81% A + T content) as an extreme example, we show that the algorithm is highly accurate and corrects over 2000 errors in the reference sequence. We give examples of its application to numerous other eukaryotic and prokaryotic genomes and suggest additional applications. Availability: The software is available at http://icorn.sourceforge.net Contact:tdo@sanger.ac.uk; cnewbold@hammer.imm.ox.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.Keywords
This publication has 12 references indexed in Scilit:
- Increasing the coverage of a metapopulation consensus genome by iterative read mapping and assemblyBioinformatics, 2009
- Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of (G+C)-biased genomesNature Methods, 2009
- A large genome center's improvements to the Illumina sequencing systemNature Methods, 2008
- Artemis and ACT: viewing, annotating and comparing sequences stored in a relational databaseBioinformatics, 2008
- Mapping short DNA sequencing reads and calling variants using mapping quality scoresGenome Research, 2008
- Automated correction of genome sequence errorsNucleic Acids Research, 2004
- Sequence of Plasmodium falciparum chromosomes 1, 3–9 and 13Nature, 2002
- SSAHA: A Fast Search Method for Large DNA DatabasesGenome Research, 2001
- Initial sequencing and analysis of the human genomeNature, 2001
- Base-Calling of Automated Sequencer Traces Using Phred. II. Error ProbabilitiesGenome Research, 1998