VDJtools: Unifying Post-analysis of T Cell Receptor Repertoires

Top Cited Papers
Open Access
Abstract
Despite the growing number of immune repertoire sequencing studies, the field still lacks software for analysis and comprehension of this high-dimensional data. Here we report VDJtools, a complementary software suite that solves a wide range of T cell receptor (TCR) repertoires post-analysis tasks, provides a detailed tabular output and publication-ready graphics, and is built on top of a flexible API. Using TCR datasets for a large cohort of unrelated healthy donors, twins, and multiple sclerosis patients we demonstrate that VDJtools greatly facilitates the analysis and leads to sound biological conclusions. VDJtools software and documentation are available at https://github.com/mikessh/vdjtools. High-throughput profiling of T- and B-cell antigen receptor repertoires promises great advances in our understanding of the mechanisms underlying adaptive immune system function, treatment of autoimmune and infectious diseases, and development of novel approaches in cancer immunotherapy. A number of recently developed software tools aim at processing immune repertoire data by mapping Variable (V), Diversity (D) and Joining (J) antigen receptor segments to sequencing reads and assembling T- and B-cell clonotypes. Nevertheless, there still exists a major gap in common methods of data post-analysis in the field: there is no standardized data format so far, and most of data comparative analysis is carried out using a variety of in-house scripts. Here we present VDJtools, a software framework that can analyze output of most commonly used TCR repertoire processing tools and allows applying a diverse set of post-analysis strategies. The main aims of our framework are: To ensure consistency of post-analysis methods and reproducibility of obtained results; to save the time of bioinformaticians analyzing TCR repertoire data by providing comprehensive tabular output and open-source API; and to provide a simple enough command line tool so that immunologists and biologists with little computational background could use it to generate publication-ready results.