A Statistical Basis for Testing the Significance of Mass Spectrometric Protein Identification Results
- 3 February 2000
- journal article
- research article
- Published by American Chemical Society (ACS) in Analytical Chemistry
- Vol. 72 (5) , 999-1005
- https://doi.org/10.1021/ac990792j
Abstract
A method for testing the significance of mass spectrometric (MS) protein identification results is presented. MS proteolytic peptide mapping and genome database searching provide a rapid, sensitive, and potentially accurate means for identifying proteins. Database search algorithms detect the matching between proteolytic peptide masses from an MS peptide map and theoretical proteolytic peptide masses of the proteins in a genome database. The number of masses that matches is used to compute a score, S, for each protein, and the protein that yields the best score is assumed as the identification result. There is a risk of obtaining a false result, because masses determined by MS are not unique; i.e., each mass in a peptide map can match randomly one or several proteins in a genome database. A false result is obtained when the score, S, due to random matching cannot be discerned from the score due to matching with a real protein in the sample. We therefore introduce the frequency function, f(S), for false (random) identification results as a basis for testing at what significance level, α, one can reject a null hypothesis, H0: “the result is false”. The significance is tested by comparing an experimental score, SE, with a critical score, SC, required for a significant result at the level α. If SE ≥ SC, H0 is rejected. f(S) and SC were obtained by simulations utilizing random tryptic peptide maps generated from a genome database. The critical score, SC, was studied as a function of the number of masses in the peptide map, the mass accuracy, the degree of incomplete enzymatic cleavage, the protein mass range, and the size of the genome. With SC known for a variety of experimental constraints, significance testing can be fully automated and integrated with database searching software used for protein identification.Keywords
This publication has 19 references indexed in Scilit:
- A Subset of TAFIIs Are Integral Components of the SAGA Complex Required for Nucleosome Acetylation and Transcriptional StimulationCell, 1998
- Analysis of the Saccharomyces Spindle Pole by Matrix-assisted Laser Desorption/Ionization (MALDI) Mass SpectrometryThe Journal of cell biology, 1998
- Protein indentification using mass spectrometric informationElectrophoresis, 1998
- The Complete Genome Sequence of Escherichia coli K-12Science, 1997
- Identification of the proteins of the yeast U1 small nuclear ribonucleoprotein complex by mass spectrometryProceedings of the National Academy of Sciences, 1997
- Strategies for whole microbial genome sequencing and analysisElectrophoresis, 1997
- Life with 6000 GenesScience, 1996
- Basics of quantitative polymerase chain reaction: 2. Electrophoresis and quantitation of polymerase chain reaction productsElectrophoresis, 1996
- Identification of proteins in polyacrylamide gels by mass spectrometric peptide mapping combined with database searchJournal of Mass Spectrometry, 1994
- Use of mass spectrometric molecular weight information to identify proteins in sequence databasesJournal of Mass Spectrometry, 1993