Charge State Estimation for Tandem Mass Spectrometry Proteomics

Abstract
High-throughput protein analysis by tandem mass spectrometry produces anywhere from thousands to millions of spectra that are being used for peptide and protein identifications. Though each spectrum corresponds only to one charged peptide (ion) state, repetitive database searches of multiple charge states are typically conducted since the resolution of many common mass spectrometers is not sufficient to determine the charge state. The resulting database searches are both error-prone and time-consuming. We describe a straightforward, accurate approach on charge state estimation (CHASTE). CHASTE relies on fragment ion peak distributions, and by using reliable logistic regression models, combines different measurements to improve its accuracy. CHASTE's performance has been validated on data sets, comprised of known peptide dissociation spectra, obtained by replicate analyses of our earlier developed protein standard mixture using ion trap mass spectrometers at different laboratories. CHASTE was able to reduce number of needed database searches by at least 60% and the number of redundant searches by at least 90% virtually without any informational loss. This greatly alleviates one of the major bottlenecks in high throughput peptide and protein identifications. Thresholds and parameter estimates can be tailored to specific analysis situations, pipelines, and instrumentations. CHASTE was implemented in Java GUI-based and command-line-based interfaces.