The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants

Top Cited Papers

Open Access

16 December 2009

journal article
review article
Published by Oxford University Press (OUP) in Nucleic Acids Research

Vol. 38 (6) , 1767-1771
https://doi.org/10.1093/nar/gkp1137

Abstract

FASTQ has emerged as a common file format for sharing sequencing read data combining both the sequence and an associated per base quality score, despite lacking any formal definition to date, and existing in at least three incompatible variants. This article defines the FASTQ format, covering the original Sanger standard, the Solexa/Illumina variants and conversion between them, based on publicly available information such as the MAQ documentation and conventions recently agreed by the Open Bioinformatics Foundation projects Biopython, BioPerl, BioRuby, BioJava and EMBOSS. Being an open access publication, it is hoped that this description, with the example files provided as Supplementary Data , will serve in future as a reference for this important file format.

Keywords

This publication has 19 references indexed in Scilit:

Biopython: freely available Python tools for computational molecular biology and bioinformatics
Bioinformatics, 2009
Applied Biosystems SOLiD™ System: Ligation‐Based Sequencing
Published by Wiley ,2008
BioJava: an open-source framework for bioinformatics
Bioinformatics, 2008
Genome sequencing in microfabricated high-density picolitre reactors
Nature, 2005
Solexa Ltd
Pharmacogenomics, 2004
The Bioperl Toolkit: Perl Modules for the Life Sciences
Genome Research, 2002
EMBOSS: The European Molecular Biology Open Software Suite
Trends in Genetics, 2000
Base-Calling of Automated Sequencer Traces UsingPhred. I. Accuracy Assessment
Genome Research, 1998
Base-Calling of Automated Sequencer Traces Using Phred. II. Error Probabilities
Genome Research, 1998
Improved tools for biological sequence comparison.
Proceedings of the National Academy of Sciences, 1988