RTCR: a pipeline for complete and accurate recovery of T cell repertoires from high throughput sequencing data
Open Access
- 20 June 2016
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 32 (20) , 3098-3106
- https://doi.org/10.1093/bioinformatics/btw339
Abstract
Motivation: High Throughput Sequencing (HTS) has enabled researchers to probe the human T cell receptor (TCR) repertoire, which consists of many rare sequences. Distinguishing between true but rare TCR sequences and variants generated by polymerase chain reaction (PCR) and sequencing errors remains a formidable challenge. The conventional approach to handle errors is to remove low quality reads, and/or rare TCR sequences. Such filtering discards a large number of true and often rare TCR sequences. However, accurate identification and quantification of rare TCR sequences is essential for repertoire diversity estimation. Results: We devised a pipeline, called Recover TCR (RTCR), that accurately recovers TCR sequences, including rare TCR sequences, from HTS data (including barcoded data) even at low coverage. RTCR employs a data-driven statistical model to rectify PCR and sequencing errors in an adaptive manner. Using simulations, we demonstrate that RTCR can easily adapt to the error profiles of different types of sequencers and exhibits consistently high recall and high precision even at low coverages where other pipelines perform poorly. Using published real data, we show that RTCR accurately resolves sequencing errors and outperforms all other pipelines. Availability and Implementation: The RTCR pipeline is implemented in Python (v2.7) and C and is freely available at http://uubram.github.io/RTCR/along with documentation and examples of typical usage. Contact:b.gerritsen@uu.nlKeywords
This publication has 41 references indexed in Scilit:
- Statistical inference of the generation probability of T-cell receptors from sequence repertoiresProceedings of the National Academy of Sciences, 2012
- Chromatin conformation governs T-cell receptor Jβ gene segment usageProceedings of the National Academy of Sciences, 2012
- Fast gapped-read alignment with Bowtie 2Nature Methods, 2012
- ART: a next-generation sequencing read simulatorBioinformatics, 2011
- Detection and quantification of rare mutations with massively parallel sequencingProceedings of the National Academy of Sciences, 2011
- Exhaustive T-cell repertoire sequencing of human peripheral blood samples reveals signatures of antigen selection and a directly measured repertoire size of at least 1 million clonotypesGenome Research, 2011
- Maximum entropy models for antibody diversityProceedings of the National Academy of Sciences, 2010
- High throughput sequencing reveals a complex pattern of dynamic interrelationships among human T cell subsetsProceedings of the National Academy of Sciences, 2010
- Next-generation DNA sequencingNature Biotechnology, 2008
- T-cell antigen receptor genes and T-cell recognitionNature, 1988