Implementation and Uses of Automated de Novo Peptide Sequencing by Tandem Mass Spectrometry

3 May 2001

journal article
research article
Published by American Chemical Society (ACS) in Analytical Chemistry

Vol. 73 (11) , 2594-2604
https://doi.org/10.1021/ac001196o

Abstract

There are several computer programs that can match peptide tandem mass spectrometry data to their exactly corresponding database sequences, and in most protein identification projects, these programs are utilized in the early stages of data interpretation. However, situations frequently arise where tandem mass spectral data cannot be correlated with any database sequences. In these cases, the unmatched data could be due to peptides derived from novel proteins, allelic or species-derived variants of known proteins, or posttranslational or chemical modifications. Two additional problems are frequently encountered in high-throughput protein identification. First, it is difficult to quickly sift through large amounts of data to identify those spectra that, due to poor signal or contaminants, can be ignored. Second, it is important to find incorrect database matches (false positives). We have chosen to address these difficulties by performing automatic de novo sequencing using a computer program called Lutefisk. Sequence candidates obtained are used as input in a homology-based database search program called CIDentify to identify variants of known proteins. Comparison of database-derived sequences with de novo sequences allows for electronic validation of database matches even if the latter are not completely correct. Modifications to the original Lutefisk program have been implemented to handle data obtained from triple quadrupole, ion trap, and quadrupole/time-of-flight hybrid (Qtof) mass spectrometers. For example, the linearity of mass errors due to temperature-dependent expansion of the flight tube in a Qtof was exploited such that isobaric amino acids (glutamine/lysine and oxidized methionine/phenylalanine) can be differentiated without careful attention to mass calibration.

Keywords

This publication has 22 references indexed in Scilit:

Probability-based protein identification by searching sequence databases using mass spectrometry data
Electrophoresis, 1999
Role of Accurate Mass Measurement (±10 ppm) in Protein Identification Strategies Employing MS or MS/MS and Database Searching
Analytical Chemistry, 1999
Protein indentification using mass spectrometric information
Electrophoresis, 1998
Error-Tolerant Identification of Peptides in Sequence Databases by Peptide Sequence Tags
Analytical Chemistry, 1994
Peptide Mass Maps: A Highly Informative Approach to Protein Identification
Analytical Biochemistry, 1993
Protein Identification by Mass Profile Fingerprinting
Biochemical and Biophysical Research Communications, 1993
Identifying proteins from two-dimensional gels by molecular mass searching of peptide fragments in protein sequence databases.
Proceedings of the National Academy of Sciences, 1993
Use of mass spectrometric molecular weight information to identify proteins in sequence databases
Journal of Mass Spectrometry, 1993
Computer-aided peptide sequencing by fast atom bombardment mass spectrometry
Journal of Mass Spectrometry, 1986
PAAS 3: A computer program to determine probable sequence of peptides from mass spectrometric data
Journal of Mass Spectrometry, 1984