DNA Sudoku—harnessing high-throughput sequencing for multiplexed specimen analysis
- 15 May 2009
- journal article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 19 (7) , 1243-1253
- https://doi.org/10.1101/gr.092957.109
Abstract
Next-generation sequencers have sufficient power to analyze simultaneously DNAs from many different specimens, a practice known as multiplexing. Such schemes rely on the ability to associate each sequence read with the specimen from which it was derived. The current practice of appending molecular barcodes prior to pooling is practical for parallel analysis of up to many dozen samples. Here, we report a strategy that permits simultaneous analysis of tens of thousands of specimens. Our approach relies on the use of combinatorial pooling strategies in which pools rather than individual specimens are assigned barcodes. Thus, the identity of each specimen is encoded within the pooling pattern rather than by its association with a particular sequence tag. Decoding the pattern allows the sequence of an original specimen to be inferred with high confidence. We verified the ability of our encoding and decoding strategies to accurately report the sequence of individual samples within a large number of mixed specimens in two ways. First, we simulated data both from a clone library and from a human population in which a sequence variant associated with cystic fibrosis was present. Second, we actually pooled, sequenced, and decoded identities within two sets of 40,000 bacterial clones comprising approximately 20,000 different artificial microRNAs targeting Arabidopsis or human genes. We achieved greater than 97% accuracy in these trials. The strategies reported here can be applied to a wide variety of biological problems, including the determination of genotypic variation within large populations of individuals.Keywords
This publication has 22 references indexed in Scilit:
- Quantification of rare allelic variants from pooled genomic DNANature Methods, 2009
- Real-Time DNA Sequencing from Single Polymerase MoleculesScience, 2009
- Accurate whole human genome sequencing using reversible terminator chemistryNature, 2008
- Identification of genetic variants using bar-coded multiplexed sequencingNature Methods, 2008
- Multiplex sequencing of plant chloroplast genomes using Solexa sequencing-by-synthesis technologyNucleic Acids Research, 2008
- Alta-Cyclic: a self-optimizing base caller for next-generation sequencingNature Methods, 2008
- Error-correcting barcoded primers for pyrosequencing hundreds of samples in multiplexNature Methods, 2008
- BLAT—The BLAST-Like Alignment ToolGenome Research, 2002
- Efficient pooling designs for library screeningGenomics, 1995
- A new upper bound for error-correcting codesIEEE Transactions on Information Theory, 1962