Estimating Dataset Size Requirements for Classifying DNA Microarray Data
Top Cited Papers
- 1 April 2003
- journal article
- research article
- Published by Mary Ann Liebert Inc in Journal of Computational Biology
- Vol. 10 (2) , 119-142
- https://doi.org/10.1089/106652703321825928
Abstract
A statistical methodology for estimating dataset size requirements for classifying microarray data using learning curves is introduced. The goal is to use existing classification results to estimate dataset size requirements for future classification experiments and to evaluate the gain in accuracy and significance of classifiers built with additional data. The method is based on fitting inverse power-law models to construct empirical learning curves. It also includes a permutation test procedure to assess the statistical significance of classification performance for a given dataset size. This procedure is applied to several molecular classification problems representing a broad spectrum of levels of complexity.Keywords
This publication has 20 references indexed in Scilit:
- Prediction of central nervous system embryonal tumour outcome based on gene expressionNature, 2002
- Distinct types of diffuse large B-cell lymphoma identified by gene expression profilingNature, 2000
- Statistical Mechanics of Support Vector NetworksPhysical Review Letters, 1999
- What size test set gives good error rate estimates?Published by Institute of Electrical and Electronics Engineers (IEEE) ,1998
- Sample size determination: a reviewJournal of the Royal Statistical Society: Series D (The Statistician), 1997
- On the Relationship between Generalization Error, Hypothesis Complexity, and Sample Complexity for Radial Basis FunctionsNeural Computation, 1996
- The statistical mechanics of learning a ruleReviews of Modern Physics, 1993
- Generalization performance of Bayes optimal classification algorithm for learning a perceptronPhysical Review Letters, 1991
- On the ability of the optimal perceptron to generaliseJournal of Physics A: General Physics, 1990
- A Graph-Dynamic Model of the Power Law of Practice and the Problem-Solving Fan-EffectScience, 1988