A computational method for assessing peptide‐ identification reliability in tandem mass spectrometry analysis with SEQUEST
- 23 March 2004
- journal article
- research article
- Published by Wiley in Proteomics
- Vol. 4 (4) , 961-969
- https://doi.org/10.1002/pmic.200300656
Abstract
High‐throughput protein identification in mass spectrometry is predominantly achieved by first identifying tryptic peptides by a database search and then by combining the peptide hits for protein identification. One of the popular tools used for the database search is SEQUEST. Peptide identification is carried out by selecting SEQUEST hits above a specified threshold, the value of which is typically chosen empirically in an attempt to separate true identifications from false ones. These SEQUEST scores are not normalized with respect to the composition, length and other parameters of the peptides. Furthermore, there is no rigorous reliability estimate assigned to the protein identifications derived from these scores. Hence, the interpretation of SEQUEST hits generally requires human involvement, making it difficult to scale up the identification process for genome‐scale applications. To overcome these limitations, we have developed a method, which combines a neural network and a statistical model, for normalizing SEQUEST scores, and also for providing a reliability estimate for each SEQUEST hit. This method improves the sensitivity and specificity of peptide identification compared to the standard filtering procedure used in the SEQUEST package, and provides a basis for estimating the reliability of protein identifications.Keywords
This publication has 23 references indexed in Scilit:
- Empirical Statistical Model To Estimate the Accuracy of Peptide Identifications Made by MS/MS and Database SearchAnalytical Chemistry, 2002
- Experimental Protein Mixture for Validating Tandem Mass Spectral AnalysisOMICS: A Journal of Integrative Biology, 2002
- Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometryNature, 2002
- Functional organization of the yeast proteome by systematic analysis of protein complexesNature, 2002
- Probability-based protein identification by searching sequence databases using mass spectrometry dataElectrophoresis, 1999
- Ion/ion chemistry of high-mass multiply charged ionsMass Spectrometry Reviews, 1998
- Direct Analysis and Identification of Proteins in Mixtures by LC/MS/MS and Database Searching at the Low-Femtomole LevelAnalytical Chemistry, 1997
- Mining Genomes: Correlating Tandem Mass Spectra of Modified and Unmodified Peptides to Sequences in Nucleotide DatabasesAnalytical Chemistry, 1995
- Method to Correlate Tandem Mass Spectra of Modified Peptides to Amino Acid Sequences in the Protein DatabaseAnalytical Chemistry, 1995
- An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein databaseJournal of the American Society for Mass Spectrometry, 1994