Popitam: Towards new heuristic strategies to improve protein identification from tandem mass spectrometry data
- 16 June 2003
- journal article
- research article
- Published by Wiley in Proteomics
- Vol. 3 (6) , 870-878
- https://doi.org/10.1002/pmic.200300402
Abstract
In recent years, proteomics research has gained importance due to increasingly powerful techniques in protein purification, mass spectrometry and identification, and due to the development of extensive protein and DNA databases from various organisms. Nevertheless, current identification methods from spectrometric data have difficulties in handling modifications or mutations in the source peptide. Moreover, they have low performance when run on large databases (such as genomic databases), or with low quality data, for example due to bad calibration or low fragmentation of the source peptide. We present a new algorithm dedicated to automated protein identification from tandem mass spectrometry (MS/MS) data by searching a peptide sequence database. Our identification approach shows promising properties for solving the specific difficulties enumerated above. It consists of matching theoretical peptide sequences issued from a database with a structured representation of the source MS/MS spectrum. The representation is similar to the spectrum graphs commonly used by de novo sequencing software. The identification process involves the parsing of the graph in order to emphazise relevant sections for each theoretical sequence, and leads to a list of peptides ranked by a correlation score. The parsing of the graph, which can be a highly combinatorial task, is performed by a bio‐inspired algorithm called Ant Colony Optimization algorithm.Keywords
This publication has 18 references indexed in Scilit:
- Functional Proteomic Analysis of Human NucleolusMolecular Biology of the Cell, 2002
- Patchwork peptide sequencing: Extraction of sequence information from accurate mass data of peptide tandem mass spectra recorded at high resolution*Proteomics, 2002
- A Dynamic Programming Approach to De Novo Peptide Sequencing via Tandem Mass SpectrometryJournal of Computational Biology, 2001
- Efficiency of Database Search for Identification of Mutated and Modified Proteins via Mass SpectrometryGenome Research, 2001
- Automated interpretation of low-energy collision-induced dissociation spectra by SeqMS, a software aid forde novo sequencing by tandem mass spectrometryElectrophoresis, 2000
- The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000Nucleic Acids Research, 2000
- Probability-based protein identification by searching sequence databases using mass spectrometry dataElectrophoresis, 1999
- De NovoPeptide Sequencing via Tandem Mass SpectrometryJournal of Computational Biology, 1999
- Error-Tolerant Identification of Peptides in Sequence Databases by Peptide Sequence TagsAnalytical Chemistry, 1994
- An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein databaseJournal of the American Society for Mass Spectrometry, 1994