Meta-analysis for Protein Identification: A Case Study on Yeast Data
- 1 June 2010
- journal article
- research article
- Published by Mary Ann Liebert Inc in OMICS: A Journal of Integrative Biology
- Vol. 14 (3) , 309-314
- https://doi.org/10.1089/omi.2010.0034
Abstract
Large amounts of mass spectrometry (MS) proteomics data are now publicly available; however, little attention has been given to how to best combine these data and assess the error rates for protein identification. The objective of this article is to show how variation in the type and amount of data included with each study impacts coverage of the yeast proteome and estimation of the false discovery rate (FDR). Our analysis of a subset of the publicly available yeast data showed that failure to reevaluate the FDR when combining protein IDs from different experiments resulted in an underestimation of the FDR by approximately threefold. A worst-case approximation of the FDR was only slightly larger than estimating the FDR by randomized database matches. The use of a weighted model to emphasize the most informative experimental data provided an increase in the number of IDs at a 1% FDR when compared to other meta-analysis approaches. Also, using an FDR higher than 1% results in a very high rate of false discoveries for IDs above the 1% threshold. Ideally, raw MS data will be made publicly available for complete and consistent reanalysis. In the circumstance that raw data is not available, determining a combined FDR on the basis of the worst-case estimation provides a reasonable approximation of the FDR. When combining experimental results, adding additional experiments results in diminishing and in some cases negative returns on protein identifications. It may be beneficial to include only those experiments generating the most unique identifications due to solid experimental design and sensitive instrumentation.Keywords
This publication has 30 references indexed in Scilit:
- A guide to the Proteomics Identifications Database proteomics data repositoryProteomics, 2009
- NCBI Peptidome: a new public repository for mass spectrometry peptide identificationsNature Biotechnology, 2009
- Decision tree–driven tandem mass spectrometry for shotgun proteomicsNature Methods, 2008
- Experiment-Specific Estimation of Peptide Identification Probabilities Using a Randomized DatabaseOMICS: A Journal of Integrative Biology, 2007
- Quality Control Metrics for LC−MS Feature Detection Tools Demonstrated onSaccharomycescerevisiaeProteomic ProfilesJournal of Proteome Research, 2006
- Characterizing complex peptide mixtures using a multi-dimensional liquid chromatography–mass spectrometry system: Saccharomyces cerevisiae as a model systemJournal of Chromatography B, 2004
- Probability-based protein identification by searching sequence databases using mass spectrometry dataElectrophoresis, 1999
- An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein databaseJournal of the American Society for Mass Spectrometry, 1994
- Meta-analysis in clinical trialsControlled Clinical Trials, 1986
- Primary, Secondary, and Meta-Analysis of ResearchEducational Researcher, 1976