Statistical inference of the generation probability of T-cell receptors from sequence repertoires
Top Cited Papers
- 17 September 2012
- journal article
- research article
- Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences
- Vol. 109 (40) , 16161-16166
- https://doi.org/10.1073/pnas.1212755109
Abstract
Stochastic rearrangement of germline V-, D-, and J-genes to create variable coding sequence for certain cell surface receptors is at the origin of immune system diversity. This process, known as “VDJ recombination”, is implemented via a series of stochastic molecular events involving gene choices and random nucleotide insertions between, and deletions from, genes. We use large sequence repertoires of the variable CDR3 region of human CD4+ T-cell receptor beta chains to infer the statistical properties of these basic biochemical events. Because any given CDR3 sequence can be produced in multiple ways, the probability distribution of hidden recombination events cannot be inferred directly from the observed sequences; we therefore develop a maximum likelihood inference method to achieve this end. To separate the properties of the molecular rearrangement mechanism from the effects of selection, we focus on nonproductive CDR3 sequences in T-cell DNA. We infer the joint distribution of the various generative events that occur when a new T-cell receptor gene is created. We find a rich picture of correlation (and absence thereof), providing insight into the molecular mechanisms involved. The generative event statistics are consistent between individuals, suggesting a universal biochemical process. Our probabilistic model predicts the generation probability of any specific CDR3 sequence by the primitive recombination process, allowing us to quantify the potential diversity of the T-cell repertoire and to understand why some sequences are shared between individuals. We argue that the use of formal statistical inference methods, of the kind presented in this paper, will be essential for quantitative understanding of the generation and evolution of diversity in the adaptive immune system.Keywords
All Related Versions
This publication has 23 references indexed in Scilit:
- A Mechanism for TCR Sharing between T Cell Subsets and Individuals Revealed by PyrosequencingThe Journal of Immunology, 2011
- Convergent recombination shapes the clonotypic landscape of the naïve T-cell repertoireProceedings of the National Academy of Sciences, 2010
- Overlap and Effective Size of the Human CD8 + T Cell Receptor RepertoireScience Translational Medicine, 2010
- The Mechanism of Double-Strand DNA Break Repair by the Nonhomologous DNA End-Joining PathwayAnnual Review of Biochemistry, 2010
- Maximum entropy models for antibody diversityProceedings of the National Academy of Sciences, 2010
- Comprehensive assessment of T-cell receptor β-chain diversity in αβ T cellsBlood, 2009
- Profiling the T-cell receptor beta-chain repertoire by massively parallel sequencingGenome Research, 2009
- High-Throughput Sequencing of the Zebrafish Antibody RepertoireScience, 2009
- IMGT/JunctionAnalysis: the first tool for the analysis of the immunoglobulin and T cell receptor complex V–J and V–D–J JUNCTIONsBioinformatics, 2004
- Different types of V(D)J recombination and end-joining defects in DNA double-strand break repair mutant mammalian cellsEuropean Journal of Immunology, 2002