Extraction, interpretation and validation of information for comparing samples in metabolic LC/MS data sets
- 22 March 2005
- journal article
- research article
- Published by Royal Society of Chemistry (RSC) in The Analyst
- Vol. 130 (5) , 701-707
- https://doi.org/10.1039/b501890k
Abstract
LC/MS is an analytical technique that, due to its high sensitivity, has become increasingly popular for the generation of metabolic signatures in biological samples and for the building of metabolic data bases. However, to be able to create robust and interpretable (transparent) multivariate models for the comparison of many samples, the data must fulfil certain specific criteria: (i) that each sample is characterized by the same number of variables, (ii) that each of these variables is represented across all observations, and (iii) that a variable in one sample has the same biological meaning or represents the same metabolite in all other samples. In addition, the obtained models must have the ability to make predictions of, e.g. related and independent samples characterized accordingly to the model samples. This method involves the construction of a representative data set, including automatic peak detection, alignment, setting of retention time windows, summing in the chromatographic dimension and data compression by means of alternating regression, where the relevant metabolic variation is retained for further modelling using multivariate analysis. This approach has the advantage of allowing the comparison of large numbers of samples based on their LC/MS metabolic profiles, but also of creating a means for the interpretation of the investigated biological system. This includes finding relevant systematic patterns among samples, identifying influential variables, verifying the findings in the raw data, and finally using the models for predictions. The presented strategy was here applied to a population study using urine samples from two cohorts, Shanxi (People’s Republic of China) and Honolulu (USA). The results showed that the evaluation of the extracted information data using partial least square discriminant analysis (PLS-DA) provided a robust, predictive and transparent model for the metabolic differences between the two populations. The presented findings suggest that this is a general approach for data handling, analysis, and evaluation of large metabolic LC/MS data sets.Keywords
This publication has 22 references indexed in Scilit:
- The challenges of modeling mammalian biocomplexityNature Biotechnology, 2004
- The role of analytical sciences in medical systems biologyCurrent Opinion in Chemical Biology, 2004
- Metabolomics and systems biology: making sense of the soupCurrent Opinion in Microbiology, 2004
- A Strategy for Identifying Differences in Large Series of Metabolomic Samples Analyzed by GC/MSAnalytical Chemistry, 2004
- Use of liquid chromatography/time‐of‐flight mass spectrometry and multivariate statistical analysis shows promise for the detection of drug metabolites in biological fluidsRapid Communications in Mass Spectrometry, 2003
- Metabonomics: NMR spectroscopy and pattern recognition analysis of body fluids and tissues for characterisation of xenobiotic toxicity and disease diagnosisCurrent Opinion in Chemical Biology, 2003
- Screening of Biomarkers in Rat Urine Using LC/Electrospray Ionization-MS and Two-Way Data AnalysisAnalytical Chemistry, 2003
- Metabolite profiling for plant functional genomicsNature Biotechnology, 2000
- Metabonomics: Metabolic processes studied by NMR spectroscopy of biofluidsConcepts in Magnetic Resonance, 2000
- 'Metabonomics': understanding the metabolic responses of living systems to pathophysiological stimuli via multivariate statistical analysis of biological NMR spectroscopic dataXenobiotica, 1999