Peptide mass fingerprinting peak intensity prediction: Extracting knowledge from spectra
- 1 October 2002
- journal article
- research article
- Published by Wiley in Proteomics
- Vol. 2 (10) , 1374-1391
- https://doi.org/10.1002/1615-9861(200210)2:10<1374::aid-prot1374>3.0.co;2-d
Abstract
Matrix‐assisted laser desorption/ionization‐time of flight mass spectrometry has become a valuable tool in proteomics. With the increasing acquisition rate of mass spectrometers, one of the major issues is the development of accurate, efficient and automatic peptide mass fingerprinting (PMF) identification tools. Current tools are mostly based on counting the number of experimental peptide masses matching with theoretical masses. Almost all of them use additional criteria such as isoelectric point, molecular weight, PTMs, taxonomy or enzymatic cleavage rules to enhance prediction performance. However, these identification tools seldom use peak intensities as parameter as there is currently no model predicting the intensities based on the physicochemical properties of peptides. In this work, we used standard datamining methods such as classification and regression methods to find correlations between peak intensities and the properties of the peptides composing a PMF spectrum. These methods were applied on a dataset comprising a series of PMF experiments involving 157 proteins. We found that the C4.5 method gave the more informative results for the classification task (prediction of the presence or absence of a peptide in a spectra) and M5' for the regression methods (prediction of the normalized intensity of a peptide peak). The C4.5 result correctly classified 88% of the theoretical peaks; whereas the M5' peak intensities had a correlation coefficient of 0.6743 with the experimental peak intensities. These methods enabled us to obtain decision and model trees that can be directly used for prediction and identification of PMF results. The work performed permitted to lay the foundations of a method to analyze factors influencing the peak intensity of PMF spectra. A simple extension of this analysis could lead to improve the accuracy of the results by using a larger dataset. Additional peptide characteristics or even PMF experimental parameters can also be taken into account in the datamining process to analyze their influence on the peak intensity. Furthermore, this datamining approach can certainly be extended to the tandem mass spectrometry domain or other mass spectrometry derived methods.Keywords
This publication has 41 references indexed in Scilit:
- Probability-based protein identification by searching sequence databases using mass spectrometry dataElectrophoresis, 1999
- High-throughput mass spectrometric discovery of protein post-translational modificationsJournal of Molecular Biology, 1999
- Protein indentification using mass spectrometric informationElectrophoresis, 1998
- Matrix-assisted Laser Desorption/Ionization Mass Spectrometry Sample Preparation Techniques Designed for Various Peptide and Protein AnalytesJournal of Mass Spectrometry, 1997
- Mass spectrometry in protein studies from genome to functionCurrent Opinion in Biotechnology, 1997
- Detailed peptide characterization using PEPTIDEMASS – a World‐Wide‐Web‐accessible toolElectrophoresis, 1997
- Why Does Matrix-assisted Laser Desorption/Ionization Time-of-flight Mass Spectrometry Give Incorrect Results for Broad Polymer Distributions?Rapid Communications in Mass Spectrometry, 1996
- Effect of the position of a basic amino acid onC-terminal rearrangement of protonated peptides upon collision-induced dissociationJournal of Mass Spectrometry, 1996
- The Effects of Matrix pH and Cation Availability on the Matrix-assisted Laser Desorption/Ionization Mass Spectrometry of Poly(methyl methacrylate)Rapid Communications in Mass Spectrometry, 1996
- Rapid identification of proteins by peptide-mass fingerprintingCurrent Biology, 1993