High-Throughput Identification of Proteins and Unanticipated Sequence Modifications Using a Mass-Based Alignment Algorithm for MS/MS de Novo Sequencing Results
- 10 March 2004
- journal article
- research article
- Published by American Chemical Society (ACS) in Analytical Chemistry
- Vol. 76 (8) , 2220-2230
- https://doi.org/10.1021/ac035258x
Abstract
With the increasing availability of de novo sequencing algorithms for interpreting high-mass accuracy tandem mass spectrometry (MS/MS) data, there is a growing need for programs that accurately identify proteins from de novo sequencing results. De novo sequences derived from tandem mass spectra of peptides often contain ambiguous regions where the exact amino acid order cannot be determined. One problem this poses for sequence alignment algorithms is the difficulty in distinguishing discrepancies due to de novo sequencing errors from actual genomic sequence variation and posttranslational modifications. We present a novel, mass-based approach to sequence alignment, implemented as a program called OpenSea, to resolve these problems. In this approach, de novo and database sequences are interpreted as masses of residues, and the masses, rather than the amino acid codes, are compared. To provide further flexibility, the masses can be aligned in groups, which can resolve many de novo sequencing errors. The performance of OpenSea was tested with three types of data: a mixture of known proteins, a mixture of unknown proteins that commonly contain sequence variations, and a mixture of posttranslationally modified known proteins. In all three cases, we demonstrate that OpenSea can identify more peptides and proteins than commonly used database-searching programs (SEQUEST and ProteinLynx) while accurately locating sequence variation sites and unanticipated posttranslational modifications in a high-throughput environment.Keywords
This publication has 20 references indexed in Scilit:
- Development of Human Protein Reference Database as an Initial Platform for Approaching Systems Biology in HumansGenome Research, 2003
- Cleavage N-Terminal to Proline: Analysis of a Database of Peptide Tandem Mass SpectraAnalytical Chemistry, 2003
- Automatedde novo sequencing of proteins using the differential scanning techniqueProteomics, 2001
- Charting the Proteomes of Organisms with Unsequenced Genomes by MALDI-Quadrupole Time-of-Flight Mass Spectrometry and BLAST Homology SearchingAnalytical Chemistry, 2001
- Probability-based protein identification by searching sequence databases using mass spectrometry dataElectrophoresis, 1999
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Sequence Analysis of βA3, βB3, and βA4 Crystallins Completes the Identification of the Major Proteins in Young Human LensJournal of Biological Chemistry, 1997
- Error-Tolerant Identification of Peptides in Sequence Databases by Peptide Sequence TagsAnalytical Chemistry, 1994
- Influence of ions on cyclization of the amino terminal glutamine residues of tryptic peptides of streptococcal PepM49 proteinInternational Journal of Peptide and Protein Research, 1989
- Fetal and embryonic haemoglobins.Journal of Medical Genetics, 1973