Preprocessing, classification modeling and feature selection using flow injection electrospray mass spectrometry metabolite fingerprint data
- 28 February 2008
- journal article
- research article
- Published by Springer Nature in Nature Protocols
- Vol. 3 (3) , 446-470
- https://doi.org/10.1038/nprot.2007.511
Abstract
Metabolome analysis by flow injection electrospray mass spectrometry (FIE-MS) fingerprinting generates measurements relating to large numbers of m/z signals. Such data sets often exhibit high variance with a paucity of replicates, thus providing a challenge for data mining. We describe data preprocessing and modeling methods that have proved reliable in projects involving samples from a range of organisms. The protocols interact with software resources specifically for metabolomics provided in a Web-accessible data analysis package FIEmspro (http://users.aber.ac.uk/jhd) written in the R environment and requiring a moderate knowledge of R command-line usage. Specific emphasis is placed on describing the outcome of modeling experiments using FIE-MS data that require further preprocessing to improve quality. The salient features of both poor and robust (i.e., highly generalizable) multivariate models are outlined together with advice on validating classifiers and avoiding false discovery when seeking explanatory variables.Keywords
This publication has 63 references indexed in Scilit:
- Predicting interpretability of metabolome models based on behavior, putative identity, and biological relevance of explanatory signalsProceedings of the National Academy of Sciences, 2006
- What should be expected from feature selection in small-sample settingsBioinformatics, 2006
- Thousands of samples are needed to generate a robust gene list for predicting outcome in cancerProceedings of the National Academy of Sciences, 2006
- Vote counting measures for ensemble classifiersPattern Recognition, 2003
- A Direct Approach to False Discovery RatesJournal of the Royal Statistical Society Series B: Statistical Methodology, 2002
- Nontargeted Metabolome Analysis by Use of Fourier Transform Ion Cyclotron Mass SpectrometryOMICS: A Journal of Integrative Biology, 2002
- Why can LDA be performed in PCA transformed space?Pattern Recognition, 2002
- Metabolic Profiling Allows Comprehensive Phenotyping of Genetically or Environmentally Modified Plant SystemsPlant Cell, 2001
- Using direct electrospray mass spectrometry in taxonomy and secondary metabolite profiling of crude fungal extractsJournal of Microbiological Methods, 1996
- Estimating the Error Rate of a Prediction Rule: Improvement on Cross-ValidationJournal of the American Statistical Association, 1983