The impact of sample imbalance on identifying differentially expressed genes
Open Access
- 12 December 2006
- journal article
- research article
- Published by Springer Nature in BMC Bioinformatics
- Vol. 7 (S4) , S8
- https://doi.org/10.1186/1471-2105-7-s4-s8
Abstract
Background: Recently several statistical methods have been proposed to identify genes with differential expression between two conditions. However, very few studies consider the problem of sample imbalance and there is no study to investigate the impact of sample imbalance on identifying differential expression genes. In addition, it is not clear which method is more suitable for the unbalanced data. Results: Based on random sampling, two evaluation models are proposed to investigate the impact of sample imbalance on identifying differential expression genes. Using the proposed evaluation models, the performances of six famous methods are compared on the unbalanced data. The experimental results indicate that the sample imbalance has a great influence on selecting differential expression genes. Furthermore, different methods have very different performances on the unbalanced data. Among the six methods, the welch t-test appears to perform best when the size of samples in the large variance group is larger than that in the small one, while the Regularized t-test and SAM outperform others on the unbalanced data in other cases. Conclusion: Two proposed evaluation models are effective and sample imbalance should be taken into account in microarray experiment design and gene expression data analysis. The results and two proposed evaluation models can provide some help in selecting suitable method to process the unbalanced data.Keywords
This publication has 26 references indexed in Scilit:
- A mixture model approach to detecting differentially expressed genes with microarray dataFunctional & Integrative Genomics, 2003
- Medical applications of microarray technologies: a regulatory science perspectiveNature Genetics, 2002
- Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression DataJournal of the American Statistical Association, 2002
- Prediction of central nervous system embryonal tumour outcome based on gene expressionNature, 2002
- Significance analysis of microarrays applied to the ionizing radiation responseProceedings of the National Academy of Sciences, 2001
- On Differential Variability of Expression Ratios: Improving Statistical Inference about Gene Expression Changes from Microarray DataJournal of Computational Biology, 2001
- Testing for Differentially-Expressed Genes by Maximum-Likelihood Analysis of Microarray DataJournal of Computational Biology, 2000
- Systematic variation in gene expression patterns in human cancer cell linesNature Genetics, 2000
- Distinct types of diffuse large B-cell lymphoma identified by gene expression profilingNature, 2000
- Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA MicroarrayScience, 1995