Detection and Validation of Non-synonymous Coding SNPs from Orthogonal Analysis of Shotgun Proteomics Data
- 9 May 2007
- journal article
- research article
- Published by American Chemical Society (ACS) in Journal of Proteome Research
- Vol. 6 (6) , 2331-2340
- https://doi.org/10.1021/pr0700908
Abstract
Orthogonal analysis of amino acid substitutions as a result of SNPs in existing proteomic datasets provides a critical foundation for the emerging field of population-based proteomics. Large-scale proteomics datasets, derived from shotgun tandem MS analysis of complex cellular protein mixtures, contain many unassigned spectra that may correspond to alternate alleles coded by SNPs. The purpose of this work was to identify tandem MS spectra in LC−MS/MS shotgun proteomics datasets that may represent coding nonsynonymous SNPs (nsSNP). To this end, we generated a tryptic peptide database created from allelic information found in NCBI's dbSNP. We searched this database with tandem MS spectra of tryptic peptides from DU4475 breast tumor cells that had been fractioned by pI in the first-dimension and reverse-phase LC in the second dimension. In all we identified 629 nsSNPs, of which 36 were of alternate SNP alleles not found in the reference NCBI or IPI protein databases. Searches for SNP-peptides carry a high risk of false positives due both to mass shifts caused by modifications and because of multiple representations of the same peptide within the genome. In this work, false positives were filtered using a novel peptide pI prediction algorithm and characterized using a decoy database developed by random substitution of similarly sized reference peptides. Secondary validation by sequencing of corresponding genomic DNA confirmed the presence of the predicted SNP in 8 of 10 SNP-peptides. This work highlights that the usefulness of interpreting unassigned spectra as polymorphisms is highly reliant on the ability to detect and filter false positives. Keywords: LC−MS/MS • single nulceotide polymorphism • false-positives • isoelectric focusing • pI filtering • population proteomicsKeywords
This publication has 23 references indexed in Scilit:
- Mass Spectrometry and Protein AnalysisScience, 2006
- Dynamic Spectrum Quality Assessment and Iterative Computational Analysis of Shotgun Proteomic DataMolecular & Cellular Proteomics, 2006
- An atlas of human gene expression from massively parallel signature sequencing (MPSS)Genome Research, 2005
- Advances in sequencing technologyMutation Research - Fundamental and Molecular Mechanisms of Mutagenesis, 2005
- A single nucleotide polymorphism based approach for the identification and characterization of gene expression modulation using MassARRAYPublished by Elsevier ,2005
- Whole-Genome Patterns of Common DNA Variation in Three Human PopulationsScience, 2005
- A comparison of immobilized pH gradient isoelectric focusing and strong‐cation‐exchange chromatography as a first dimension in shotgun proteomicsProteomics, 2005
- Pattern of Sequence Variation Across 213 Environmental Response GenesGenome Research, 2004
- Mass spectrometry-based proteomicsNature, 2003
- Proteomic analysis of post-translational modificationsNature Biotechnology, 2003