The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants
Top Cited Papers
Open Access
- 16 December 2009
- journal article
- review article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 38 (6) , 1767-1771
- https://doi.org/10.1093/nar/gkp1137
Abstract
FASTQ has emerged as a common file format for sharing sequencing read data combining both the sequence and an associated per base quality score, despite lacking any formal definition to date, and existing in at least three incompatible variants. This article defines the FASTQ format, covering the original Sanger standard, the Solexa/Illumina variants and conversion between them, based on publicly available information such as the MAQ documentation and conventions recently agreed by the Open Bioinformatics Foundation projects Biopython, BioPerl, BioRuby, BioJava and EMBOSS. Being an open access publication, it is hoped that this description, with the example files provided as Supplementary Data , will serve in future as a reference for this important file format.Keywords
This publication has 19 references indexed in Scilit:
- Biopython: freely available Python tools for computational molecular biology and bioinformaticsBioinformatics, 2009
- Applied Biosystems SOLiD™ System: Ligation‐Based SequencingPublished by Wiley ,2008
- BioJava: an open-source framework for bioinformaticsBioinformatics, 2008
- Genome sequencing in microfabricated high-density picolitre reactorsNature, 2005
- Solexa LtdPharmacogenomics, 2004
- The Bioperl Toolkit: Perl Modules for the Life SciencesGenome Research, 2002
- EMBOSS: The European Molecular Biology Open Software SuiteTrends in Genetics, 2000
- Base-Calling of Automated Sequencer Traces UsingPhred. I. Accuracy AssessmentGenome Research, 1998
- Base-Calling of Automated Sequencer Traces Using Phred. II. Error ProbabilitiesGenome Research, 1998
- Improved tools for biological sequence comparison.Proceedings of the National Academy of Sciences, 1988