HSPVdb—the Human Short Peptide Variation Database for improved mass spectrometry-based detection of polymorphic HLA-ligands
Open Access
- 2 December 2010
- journal article
- research article
- Published by Springer Nature in Immunogenetics
- Vol. 63 (3) , 143-153
- https://doi.org/10.1007/s00251-010-0497-1
Abstract
T cell epitopes derived from polymorphic proteins or from proteins encoded by alternative reading frames (ARFs) play an important role in (tumor) immunology. Identification of these peptides is successfully performed with mass spectrometry. In a mass spectrometry-based approach, the recorded tandem mass spectra are matched against hypothetical spectra generated from known protein sequence databases. Commonly used protein databases contain a minimal level of redundancy, and thus, are not suitable data sources for searching polymorphic T cell epitopes, either in normal or ARFs. At the same time, however, these databases contain much non-polymorphic sequence information, thereby complicating the matching of recorded and theoretical spectra, and increasing the potential for finding false positives. Therefore, we created a database with peptides from ARFs and peptide variation arising from single nucleotide polymorphisms (SNPs). It is based on the human mRNA sequences from the well-annotated reference sequence (RefSeq) database and associated variation information derived from the Single Nucleotide Polymorphism Database (dbSNP). In this process, we removed all non-polymorphic information. Investigation of the frequency of SNPs in the dbSNP revealed that many SNPs are non-polymorphic “SNPs”. Therefore, we removed those from our dedicated database, and this resulted in a comprehensive high quality database, which we coined the Human Short Peptide Variation Database (HSPVdb). The value of our HSPVdb is shown by identification of the majority of published polymorphic SNP- and/or ARF-derived epitopes from a mass spectrometry-based proteomics workflow, and by a large variety of polymorphic peptides identified as potential T cell epitopes in the HLA-ligandome presented by the Epstein–Barr virus cells.Keywords
This publication has 28 references indexed in Scilit:
- Design and utilization of epitope-based databases and predictive toolsImmunogenetics, 2010
- The nonpolymorphic MHC Qa-1b mediates CD8+ T cell surveillance of antigen-processing defectsThe Journal of Experimental Medicine, 2009
- The Universal Protein Resource (UniProt)Nucleic Acids Research, 2007
- Analysis and validation of proteomic data generated by tandem mass spectrometryNature Methods, 2007
- Phenotype Frequencies of Autosomal Minor Histocompatibility Antigens Display Significant Differences among PopulationsPLoS Genetics, 2007
- NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteinsNucleic Acids Research, 2007
- Novel peptide identification from tandem mass spectra using ESTs and sequence database compressionMolecular Systems Biology, 2007
- The contributions of mass spectrometry to understanding of immune recognition by T lymphocytesInternational Journal of Mass Spectrometry, 2007
- The International Protein Index: An integrated database for proteomics experimentsProteomics, 2004
- An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein databaseJournal of the American Society for Mass Spectrometry, 1994