Data reduction for spectral clustering to analyze high throughput flow cytometry data
Open Access
- 28 July 2010
- journal article
- Published by Springer Nature in BMC Bioinformatics
- Vol. 11 (1) , 403
- https://doi.org/10.1186/1471-2105-11-403
Abstract
Background: Recent biological discoveries have shown that clustering large datasets is essential for better understanding biology in many areas. Spectral clustering in particular has proven to be a powerful tool amenable for many applications. However, it cannot be directly applied to large datasets due to time and memory limitations. To address this issue, we have modified spectral clustering by adding an information preserving sampling procedure and applying a post-processing stage. We call this entire algorithm SamSPECTRAL.Results: We tested our algorithm on flow cytometry data as an example of large, multidimensional data containing potentially hundreds of thousands of data points (i.e., "events" in flow cytometry, typically corresponding to cells). Compared to two state of the art model-based flow cytometry clustering methods, SamSPECTRAL demonstrates significant advantages in proper identification of populations with non-elliptical shapes, low density populations close to dense ones, minor subpopulations of a major population and rare populations.Conclusions: This work is the first successful attempt to apply spectral methodology on flow cytometry data. An implementation of our algorithm as an R package is freely available through BioConductor.Keywords
This publication has 38 references indexed in Scilit:
- Data analysis in flow cytometry: The future just startedCytometry Part A, 2010
- A Survey of Flow Cytometry Data Analysis MethodsAdvances in Bioinformatics, 2009
- Merging Mixture Components for Cell Population Identification in Flow CytometryAdvances in Bioinformatics, 2009
- Scalable analysis of flow cytometry data using R/BioconductorCytometry Part A, 2009
- Advances in complex multiparameter flow cytometry technology: Applications in stem cell researchCytometry Part B: Clinical Cytometry, 2009
- Automated high-dimensional flow cytometric data analysisProceedings of the National Academy of Sciences, 2009
- Regularized gene selection in cancer microarray meta-analysisBMC Bioinformatics, 2009
- Development of an automated analysis system for data from flow cytometric intracellular cytokine staining assays from clinical vaccine trialsCytometry Part A, 2008
- Statistical mixture modeling for cell subtype identification in flow cytometryCytometry Part A, 2008
- High-Content Flow Cytometry and Temporal Data Analysis for Defining a Cellular Signature of Graft-Versus-Host DiseaseTransplantation and Cellular Therapy, 2007