Missing value estimation for DNA microarray gene expression data: local least squares imputation
Top Cited Papers
Open Access
- 27 August 2004
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 21 (2) , 187-198
- https://doi.org/10.1093/bioinformatics/bth499
Abstract
Motivation: Gene expression data often contain missing expression values. Effective missing value estimation methods are needed since many algorithms for gene expression data analysis require a complete matrix of gene array values. In this paper, imputation methods based on the least squares formulation are proposed to estimate missing values in the gene expression data, which exploit local similarity structures in the data as well as least squares optimization process. Results: The proposed local least squares imputation method (LLSimpute) represents a target gene that has missing values as a linear combination of similar genes. The similar genes are chosen by k-nearest neighbors or k coherent genes that have large absolute values of Pearson correlation coefficients. Non-parametric missing values estimation method of LLSimpute are designed by introducing an automatic k-value estimator. In our experiments, the proposed LLSimpute method shows competitive results when compared with other imputation methods for missing value estimation on various datasets and percentages of missing values in the data. Availability: The software is available at http://www.cs.umn.edu/~hskim/tools.html Contact:hpark@cs.umn.eduKeywords
This publication has 15 references indexed in Scilit:
- LSimpute: accurate estimation of missing values in microarray data with least squares methodsNucleic Acids Research, 2004
- A Bayesian missing value estimation method for gene expression profile dataBioinformatics, 2003
- New gene selection method for classification of cancer subtypes considering within‐class variationFEBS Letters, 2003
- Generalized singular value decomposition for comparative analysis of genome-scale expression data sets of two different organismsProceedings of the National Academy of Sciences, 2003
- Gene expression profiling predicts clinical outcome of breast cancerNature, 2002
- Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learningNature Medicine, 2002
- Genomic Expression Responses to DNA-damaging Agents and the Regulatory Role of the Yeast ATR Homolog Mec1pMolecular Biology of the Cell, 2001
- Construction of Preferential cDNA Microarray Specialized for Human Colorectal Carcinoma: Molecular Sketch of Colorectal CancerBiochemical and Biophysical Research Communications, 2001
- The Stanford Microarray DatabaseNucleic Acids Research, 2001
- Comprehensive Identification of Cell Cycle–regulated Genes of the YeastSaccharomyces cerevisiaeby Microarray HybridizationMolecular Biology of the Cell, 1998