Protein and peptide identification algorithms using MS for use in high-throughput, automated pipelines
- 1 November 2005
- journal article
- review article
- Published by Wiley in Proteomics
- Vol. 5 (16) , 4082-4095
- https://doi.org/10.1002/pmic.200402091
Abstract
Current proteomics experiments can generate vast quantities of data very quickly, but this has not been matched by data analysis capabilities. Although there have been a number of recent reviews covering various aspects of peptide and protein identification methods using MS, comparisons of which methods are either the most appropriate for, or the most effective at, their proposed tasks are not readily available. As the need for high‐throughput, automated peptide and protein identification systems increases, the creators of such pipelines need to be able to choose algorithms that are going to perform well both in terms of accuracy and computational efficiency. This article therefore provides a review of the currently available core algorithms for PMF, database searching using MS/MS, sequence tag searches and de novo sequencing. We also assess the relative performances of a number of these algorithms. As there is limited reporting of such information in the literature, we conclude that there is a need for the adoption of a system of standardised reporting on the performance of new peptide and protein identification algorithms, based upon freely available datasets. We go on to present our initial suggestions for the format and content of these datasets.Keywords
This publication has 85 references indexed in Scilit:
- Prediction of post‐translational glycosylation and phosphorylation of proteins from the amino acid sequenceProteomics, 2004
- A new approach that allows identification of intron‐split peptides from mass spectrometric data in genomic databasesFEBS Letters, 2004
- Analysis, statistical validation and dissemination of large-scale proteomics datasets generated by tandem MSDrug Discovery Today, 2004
- Intensity-based protein identification by machine learning from a library of tandem mass spectraNature Biotechnology, 2004
- New computational approaches for de novo peptide sequencing from MS/MS experimentsProceedings of the IEEE, 2002
- Empirical Statistical Model To Estimate the Accuracy of Peptide Identifications Made by MS/MS and Database SearchAnalytical Chemistry, 2002
- SALSA: A Pattern Recognition Algorithm To Detect Electrophile-Adducted Peptides by Automated Evaluation of CID Spectra in LC−MS−MS AnalysesAnalytical Chemistry, 2001
- Probability-based protein identification by searching sequence databases using mass spectrometry dataElectrophoresis, 1999
- Of Genomes and ProteomesBiochemical and Biophysical Research Communications, 1997
- An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein databaseJournal of the American Society for Mass Spectrometry, 1994