Developmental stage annotation of Drosophila gene expression pattern images via an entire solution path for LDA
- 1 March 2008
- journal article
- research article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Knowledge Discovery From Data
- Vol. 2 (1) , 1-21
- https://doi.org/10.1145/1342320.1342324
Abstract
Gene expression in a developing embryo occurs in particular cells (spatial patterns) in a time-specific manner (temporal patterns), which leads to the differentiation of cell fates. Images of a Drosophila melanogaster embryo at a given developmental stage, showing a particular gene expression pattern revealed by a gene-specific probe, can be compared for spatial overlaps. The comparison is fundamentally important to formulating and testing gene interaction hypotheses. Expression pattern comparison is most biologically meaningful when images from a similar time point (developmental stage) are compared. In this paper, we present LdaPath, a novel formulation of Linear Discriminant Analysis (LDA) for automatic developmental stage range classification. It employs multivariate linear regression with the L 1 -norm penalty controlled by a regularization parameter for feature extraction and visualization. LdaPath computes an entire solution path for all values of regularization parameter with essentially the same computational cost as fitting one LDA model. Thus, it facilitates efficient model selection. It is based on the equivalence relationship between LDA and the least squares method for multiclass classifications. This equivalence relationship is established under a mild condition, which we show empirically to hold for many high-dimensional datasets, such as expression pattern images. Our experiments on a collection of 2705 expression pattern images show the effectiveness of the proposed algorithm. Results also show that the LDA model resulting from LdaPath is sparse, and irrelevant features may be removed. Thus, LdaPath provides a general framework for simultaneous feature selection and feature extraction.Keywords
Funding Information
- National Institutes of Health (HG002516)
- Division of Information and Intelligent Systems (IIS-0612069)
This publication has 22 references indexed in Scilit:
- On L_1-Norm Multi-class Support Vector MachinesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2006
- Regularized linear discriminant analysis and its application in microarraysBiostatistics, 2006
- For most large underdetermined systems of equations, the minimal 𝓁1‐norm near‐solution approximates the sparsest near‐solutionCommunications on Pure and Applied Mathematics, 2006
- A Relationship between Linear Discriminant Analysis and the Generalized Minimum Squared Error SolutionSIAM Journal on Matrix Analysis and Applications, 2005
- Least angle regressionThe Annals of Statistics, 2004
- Multicategory Support Vector MachinesJournal of the American Statistical Association, 2004
- Comparing in situ mRNA expression patterns of drosophila embryosPublished by Association for Computing Machinery (ACM) ,2004
- Eigenfaces vs. Fisherfaces: recognition using class specific linear projectionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1997
- INTRODUCTIONPublished by Elsevier ,1990
- Complete discrete 2-D Gabor transforms by neural networks for image analysis and compressionIEEE Transactions on Acoustics, Speech, and Signal Processing, 1988