Quality Control and Peak Finding for Proteomics Data Collected from Nipple Aspirate Fluid by Surface-Enhanced Laser Desorption and Ionization

Abstract
Recently, researchers have been using mass spectroscopy to study cancer. For use of proteomics spectra in a clinical setting, stringent quality-control procedures will be needed. We pooled samples of nipple aspirate fluid from healthy breasts and breasts with cancer to prepare a control sample. Aliquots of the control sample were used on two spots on each of three IMAC ProteinChip arrays (Ciphergen Biosystems, Inc.) on 4 successive days to generate 24 SELDI spectra. In 36 subsequent experiments, the control sample was applied to two spots of each ProteinChip array, and the resulting spectra were analyzed to determine how closely they agreed with the original 24 spectra. We describe novel algorithms that (a) locate peaks in unprocessed proteomics spectra and (b) iteratively combine peak detection with baseline correction. These algorithms detected approximately 200 peaks per spectrum, 68 of which are detected in all 24 original spectra. The peaks were highly correlated across samples. Moreover, we could explain 80% of the variance, using only six principal components. Using a criterion that rejects a chip if the Mahalanobis distance from both control spectra to the center of the six-dimensional principal component space exceeds the 95% confidence limit threshold, we rejected 5 of the 36 chips. Mahalanobis distance in principal component space provides a method for assessing the reproducibility of proteomics spectra that is robust, effective, easily computed, and statistically sound.