Independent surrogate variable analysis to deconvolve confounding factors in large-scale microarray profiling studies
Top Cited Papers
Open Access
- 6 April 2011
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 27 (11) , 1496-1505
- https://doi.org/10.1093/bioinformatics/btr171
Abstract
Motivation: A common difficulty in large-scale microarray studies is the presence of confounding factors, which may significantly skew estimates of statistical significance, cause unreliable feature selection and high false negative rates. To deal with these difficulties, an algorithmic framework known as Surrogate Variable Analysis (SVA) was recently proposed. Results: Based on the notion that data can be viewed as an interference pattern, reflecting the superposition of independent effects and random noise, we present a modified SVA, called Independent Surrogate Variable Analysis (ISVA), to identify features correlating with a phenotype of interest in the presence of potential confounding factors. Using simulated data, we show that ISVA performs well in identifying confounders as well as outperforming methods which do not adjust for confounding. Using four large-scale Illumina Infinium DNA methylation datasets subject to low signal to noise ratios and substantial confounding by beadchip effects and variable bisulfite conversion efficiency, we show that ISVA improves the identifiability of confounders and that this enables a framework for feature selection that is more robust to model misspecification and heterogeneous phenotypes. Finally, we demonstrate similar improvements of ISVA across four mRNA expression datasets. Thus, ISVA should be useful as a feature selection tool in studies that are subject to confounding. Availability: An R-package isva is available from www.cran.r-project.org. Contact:a.teschendorff@ucl.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.Keywords
This publication has 35 references indexed in Scilit:
- Genome-Wide Dna Methylation Profiling Using Infinium ® AssayEpigenomics, 2009
- MicroRNA expression profiling of human breast cancer identifies new markers of tumor subtypeGenome Biology, 2007
- Independent component analysis reveals new and biologically significant structures in micro array dataBMC Bioinformatics, 2006
- Independent component analysis-based penalized discriminant method for tumor classification using gene expression dataBioinformatics, 2006
- An estrogen receptor-negative breast cancer subset characterized by a hormonally regulated transcriptional program and response to androgenOncogene, 2006
- The operons, a criterion to compare the reliability of transcriptome analysis tools: ICA is more reliable than ANOVA, PLS and PCAComputational Biology and Chemistry, 2004
- A comparison of normalization methods for high density oligonucleotide array data based on variance and biasBioinformatics, 2003
- Independent component analysis, A new concept?Signal Processing, 1994
- Remarks on Parallel AnalysisMultivariate Behavioral Research, 1992
- Posterior analysis of the factor modelBritish Journal of Mathematical and Statistical Psychology, 1981