The Statistical Significance of Protein Identification Results as a Function of the Number of Protein Sequences Searched
- 27 August 2004
- journal article
- Published by American Chemical Society (ACS) in Journal of Proteome Research
- Vol. 3 (5) , 979-982
- https://doi.org/10.1021/pr0499343
Abstract
The potential for obtaining a true mass spectrometric protein identification result depends on the choice of algorithm as well as on experimental factors that influence the information content in the mass spectrometric data. Current methods can never prove definitively that a result is true, but an appropriate choice of algorithm can provide a measure of the statistical risk that a result is false, i.e., the statistical significance. We recently demonstrated an algorithm, Probity, which assigns the statistical significance to each result. For any choice of algorithm, the difficulty of obtaining statistically significant results depends on the number of protein sequences in the sequence collection searched. By simulations of random protein identifications and using the Probity algorithm, we here demonstrate explicitly how the statistical significance depends on the number of sequences searched. We also provide an example on how the practitioner's choice of taxonomic constraints influences the statistical significance.Keywords
This publication has 27 references indexed in Scilit:
- Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometryNature, 2002
- Functional organization of the yeast proteome by systematic analysis of protein complexesNature, 2002
- Everyone's genomeNature, 2001
- The Yeast Nuclear Pore ComplexThe Journal of cell biology, 2000
- Genome Sequence of the Nematode C. elegans : A Platform for Investigating BiologyScience, 1998
- Analysis of the Saccharomyces Spindle Pole by Matrix-assisted Laser Desorption/Ionization (MALDI) Mass SpectrometryThe Journal of cell biology, 1998
- Identification of the proteins of the yeast U1 small nuclear ribonucleoprotein complex by mass spectrometryProceedings of the National Academy of Sciences, 1997
- Identification of components of trans‐Golgi network‐derived transport vesicles and detergent‐insoluble complexes by nanoelectrospray tandem mass spectrometryElectrophoresis, 1997
- Life with 6000 GenesScience, 1996
- Whole-Genome Random Sequencing and Assembly of Haemophilus influenzae RdScience, 1995