Classification of a large microarray data set: Algorithm comparison and analysis of drug signatures
Open Access
- 2 May 2005
- journal article
- research article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 15 (5) , 724-736
- https://doi.org/10.1101/gr.2807605
Abstract
A large gene expression database has been produced that characterizes the gene expression and physiological effects of hundreds of approved and withdrawn drugs, toxicants, and biochemical standards in various organs of live rats. In order to derive useful biological knowledge from this large database, a variety of supervised classification algorithms were compared using a 597-microarray subset of the data. Our studies show that several types of linear classifiers based on Support Vector Machines (SVMs) and Logistic Regression can be used to derive readily interpretable drug signatures with high classification performance. Both methods can be tuned to produce classifiers of drug treatments in the form of short, weighted gene lists which upon analysis reveal that some of the signature genes have a positive contribution (act as “rewards” for the class-of-interest) while others have a negative contribution (act as “penalties”) to the classification decision. The combination of reward and penalty genes enhances performance by keeping the number of false positive treatments low. The results of these algorithms are combined with feature selection techniques that further reduce the length of the drug signatures, an important step towards the development of useful diagnostic biomarkers and low-cost assays. Multiple signatures with no genes in common can be generated for the same classification end-point. Comparison of these gene lists identifies biological processes characteristic of a given class.Keywords
This publication has 22 references indexed in Scilit:
- Development of a large-scale chemogenomics database to improve drug candidate selection and to understand mechanisms of chemical toxicity and actionJournal of Biotechnology, 2005
- Prediction of clinical drug efficacy by classification of drug-induced genomic expression profiles in vitroProceedings of the National Academy of Sciences, 2003
- A Highly Reproducible, Linear, and Automated Sample Preparation Method for DNA MicroarraysGenome Research, 2002
- Multiclass cancer diagnosis using tumor gene expression signaturesProceedings of the National Academy of Sciences, 2001
- Effects of the Antifungal Agents on Oxidative Drug MetabolismClinical Pharmacokinetics, 2000
- Sterol regulatory element-binding proteinsCurrent Opinion in Lipidology, 1999
- The Nuclear Receptors Peroxisome Proliferator-activated Receptor α and Rev-erbα Mediate the Species-specific Regulation of Apolipoprotein A-I Expression by FibratesJournal of Biological Chemistry, 1998
- Targeted Disruption of the α Isoform of the Peroxisome Proliferator-Activated Receptor Gene in Mice Results in Abolishment of the Pleiotropic Effects of Peroxisome ProliferatorsMolecular and Cellular Biology, 1995
- Mevinolin: a highly potent competitive inhibitor of hydroxymethylglutaryl-coenzyme A reductase and a cholesterol-lowering agent.Proceedings of the National Academy of Sciences, 1980
- Biochemical effects of miconazole on fungi. II. Inhibition of ergosterol biosynthesis in Candida albicansChemico-Biological Interactions, 1978