Evaluation of normalization methods for cDNA microarray data by k-NN classification

Open Access

26 July 2005

journal article
research article
Published by Springer Nature in BMC Bioinformatics

Vol. 6 (1) , 191
https://doi.org/10.1186/1471-2105-6-191

Abstract

Background: Non-biological factors give rise to unwanted variations in cDNA microarray data. There are many normalization methods designed to remove such variations. However, to date there have been few published systematic evaluations of these techniques for removing variations arising from dye biases in the context of downstream, higher-order analytical tasks such as classification. Results: Ten location normalization methods that adjust spatial- and/or intensity-dependent dye biases, and three scale methods that adjust scale differences were applied, individually and in combination, to five distinct, published, cancer biology-related cDNA microarray data sets. Leave-one-out cross-validation (LOOCV) classification error was employed as the quantitative end-point for assessing the effectiveness of a normalization method. In particular, a known classifier, k-nearest neighbor (k-NN), was estimated from data normalized using a given technique, and the LOOCV error rate of the ensuing model was computed. We found that k-NN classifiers are sensitive to dye biases in the data. Using N ONRM and GMEDIAN as baseline methods, our results show that single-bias-removal techniques which remove either spatial-dependent dye bias (referred later as spatial effect) or intensity-dependent dye bias (referred later as intensity effect) moderately reduce LOOCV classification errors; whereas double-bias-removal techniques which remove both spatial- and intensity effect reduce LOOCV classification errors even further. Of the 41 different strategies examined, three two-step processes, IG LOESS-SL FILTERW7, IST SPLINE-SL LOESS and IG LOESS-SL LOESS, all of which removed intensity effect globally and spatial effect locally, appear to reduce LOOCV classification errors most consistently and effectively across all data sets. We also found that the investigated scale normalization methods do not reduce LOOCV classification error. Conclusion: Using LOOCV error of k-NNs as the evaluation criterion, three double-bias-removal normalization strategies, IG LOESS-SL FILTERW7, IST SPLINE-SL LOESS and IG LOESS-SL LOESS, outperform other strategies for removing spatial effect, intensity effect and scale differences from cDNA microarray data. The apparent sensitivity of k-NN LOOCV classification error to dye biases suggests that this criterion provides an informative measure for evaluating normalization methods. All the computational tools used in this study were implemented using the R language for statistical computing and graphics.

Keywords

This publication has 40 references indexed in Scilit:

A benchmark for Affymetrix GeneChip expression measures
Bioinformatics, 2004
Variation in Gene Expression Patterns in Human Gastric Cancers
Molecular Biology of the Cell, 2003
Chromosome aberrations in solid tumors
Nature Genetics, 2003
New normalization methods for cDNA microarray data
Bioinformatics, 2003
Gene Expression Patterns in Renal Cell Carcinoma Assessed by Complementary DNA Microarray
The American Journal of Pathology, 2003
Transformations for cDNA Microarray Data
Statistical Applications in Genetics and Molecular Biology, 2003
Microarray data normalization and transformation
Nature Genetics, 2002
Ranking: a closer look on globalisation methods for normalisation of gene expression arrays
Nucleic Acids Research, 2002
Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling
Nature, 2000
Ratio-based decisions and the quantitative analysis of cDNA microarray images
Journal of Biomedical Optics, 1997