Robust accurate identification of peptides (RAId): deciphering MS2 data using a structured library search with de novo based statistics
Open Access
- 16 August 2005
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 21 (19) , 3726-3732
- https://doi.org/10.1093/bioinformatics/bti620
Abstract
Motivation: The key to MS -based proteomics is peptide sequencing. The major challenge in peptide sequencing, whether library search or de novo, is to better infer statistical significance and better attain noise reduction. Since the noise in a spectrum depends on experimental conditions, the instrument used and many other factors, it cannot be predicted even if the peptide sequence is known. The characteristics of the noise can only be uncovered once a spectrum is given. We wish to overcome such issues. Results: We designed RAId to identify peptides from their associated tandem mass spectrometry data. RAId performs a novel de novo sequencing followed by a search in a peptide library that we created. Through de novo sequencing, we establish the spectrum-specific background score statistics for the library search. When the database search fails to return significant hits, the top-ranking de novo sequences become potential candidates for new peptides that are not yet in the database. The use of spectrum-specific background statistics seems to enable RAId to perform well even when the spectral quality is marginal. Other important features of RAId include its potential in de novo sequencing alone and the ease of incorporating post-translational modifications. Availability: Programs implementing the methods described are available from the authors on request. Contact:yyu@ncbi.nlm.nih.gov Supplementary information:ftp://ftp.ncbi.nih.gov/pub/yyu/Proteomics/MSMS/RAId/MSMS_bioinfo_supp.pdfKeywords
This publication has 26 references indexed in Scilit:
- Ranked solutions to a class of combinatorial optimizations—with applications in mass spectrometry based peptide sequencing and a variant of directed paths in random mediaPhysica A: Statistical Mechanics and its Applications, 2005
- TANDEM: matching proteins with tandem mass spectraBioinformatics, 2004
- Popitam: Towards new heuristic strategies to improve protein identification from tandem mass spectrometry dataProteomics, 2003
- A New Algorithm for the Evaluation of Shotgun Peptide Sequencing in Proteomics: Support Vector Machine Classification of Peptide MS/MS Spectra and SEQUEST ScoresJournal of Proteome Research, 2002
- Pro-Frame: similarity-based gene recognition in eukaryotic DNA sequences with errorsBioinformatics, 2001
- An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein databaseJournal of the American Society for Mass Spectrometry, 1994
- Appendix 5. Nomenclature for peptide fragment ions (positive ions)Published by Elsevier ,1990
- Fourier-transform mass spectrometry of large molecules by electrospray ionization.Proceedings of the National Academy of Sciences, 1989
- Electrospray Ionization for Mass Spectrometry of Large BiomoleculesScience, 1989
- Fourier transform ion cyclotron resonance spectroscopyChemical Physics Letters, 1974