Prediction of Gene Expression in Embryonic Structures of Drosophila melanogaster
Open Access
- 20 July 2007
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLoS Computational Biology
- Vol. 3 (7) , e144
- https://doi.org/10.1371/journal.pcbi.0030144
Abstract
Understanding how sets of genes are coordinately regulated in space and time to generate the diversity of cell types that characterise complex metazoans is a major challenge in modern biology. The use of high-throughput approaches, such as large-scale in situ hybridisation and genome-wide expression profiling via DNA microarrays, is beginning to provide insights into the complexities of development. However, in many organisms the collection and annotation of comprehensive in situ localisation data is a difficult and time-consuming task. Here, we present a widely applicable computational approach, integrating developmental time-course microarray data with annotated in situ hybridisation studies, that facilitates the de novo prediction of tissue-specific expression for genes that have no in vivo gene expression localisation data available. Using a classification approach, trained with data from microarray and in situ hybridisation studies of gene expression during Drosophila embryonic development, we made a set of predictions on the tissue-specific expression of Drosophila genes that have not been systematically characterised by in situ hybridisation experiments. The reliability of our predictions is confirmed by literature-derived annotations in FlyBase, by overrepresentation of Gene Ontology biological process annotations, and, in a selected set, by detailed gene-specific studies from the literature. Our novel organism-independent method will be of considerable utility in enriching the annotation of gene function and expression in complex multicellular organisms. The task of deciphering the complex transcriptional regulatory networks controlling development is one of the major current challenges for molecular biology. The problem is difficult, if not impossible, to solve without a detailed knowledge of the spatiotemporal dynamics of gene expression. Thus, to understand development, we need to identify and functionally characterize all players in regulatory networks. Data on gene expression dynamics obtained from whole transcriptome microarray experiments, combined with in situ hybridization mRNA localisation patterns for a subset of genes, may provide a route for predicting the localisation of gene expression for those genes for which in situ data has not been generated, as well as suggesting functional information for uncharacterised genes. Here, we report the development of one of the first methods for predicting the localisation of gene expression during Drosophila embryogenesis from microarray data. Pooling the subset of genes in the fly genome with in situ data to form functional units, localised in space and time for relevant developmental processes, facilitates the statement of a classification problem, which we address with machine-learning methods. Our approach promotes a richer annotation of biological function for genes in the absence of costly and time-consuming experimental analysis.Keywords
This publication has 43 references indexed in Scilit:
- An Integrated Strategy for Analyzing the Unique Developmental Programs of Different Myoblast SubtypesPLoS Genetics, 2006
- MEPD: a resource for medaka gene expression patternsBioinformatics, 2005
- A Gene-Coexpression Network for Global Discovery of Conserved Genetic ModulesScience, 2003
- Assessment of Genome-Wide Protein Function Classification for Drosophila melanogasterGenome Research, 2003
- Gene Expression During the Life Cycle of Drosophila melanogasterScience, 2002
- Comparing classifiers when the misallocation costs are uncertainPattern Recognition, 1999
- Feature selection using expected attainable discriminationPattern Recognition Letters, 1998
- SelD homolog from Drosophila lacking selenide-dependent monoselenophosphate synthetase activityJournal of Molecular Biology, 1997
- Gene expression and development databases forC. elegansSeminars in Cell & Developmental Biology, 1997
- Measuring the Accuracy of Diagnostic SystemsScience, 1988