Clustering of Gene Expression Data Based on Shape Similarity
Open Access
- 1 January 2009
- journal article
- research article
- Published by Springer Nature in EURASIP Journal on Bioinformatics and Systems Biology
- Vol. 2009 (1) , 195712
- https://doi.org/10.1155/2009/195712
Abstract
A method for gene clustering from expression profiles using shape information is presented. The conventional clustering approaches such as K-means assume that genes with similar functions have similar expression levels and hence allocate genes with similar expression levels into the same cluster. However, genes with similar function often exhibit similarity in signal shape even though the expression magnitude can be far apart. Therefore, this investigation studies clustering according to signal shape similarity. This shape information is captured in the form of normalized and time-scaled forward first differences, which then are subject to a variational Bayes clustering plus a non-Bayesian (Silhouette) cluster statistic. The statistic shows an improved ability to identify the correct number of clusters and assign the components of cluster. Based on initial results for both generated test data and Escherichia coli microarray expression data and initial validation of the Escherichia coli results, it is shown that the method has promise in being able to better cluster time-series microarray data according to shape similarity.Keywords
This publication has 21 references indexed in Scilit:
- Determining the Number of Clusters Using the Weighted Gap StatisticBiometrics, 2007
- BiNGO: a Cytoscape plugin to assess overrepresentation of Gene Ontology categories in Biological NetworksBioinformatics, 2005
- Clustering of unevenly sampled gene expression time-series dataFuzzy Sets and Systems, 2004
- Clustering of gene expression data using a local shape-based similarity measureBioinformatics, 2004
- Cluster analysis for gene expression data: a surveyIEEE Transactions on Knowledge and Data Engineering, 2004
- Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction NetworksGenome Research, 2003
- Continuous Representations of Time-Series Gene Expression DataJournal of Computational Biology, 2003
- Estimating the Number of Clusters in a Data Set Via the Gap StatisticJournal of the Royal Statistical Society Series B: Statistical Methodology, 2001
- Silhouettes: A graphical aid to the interpretation and validation of cluster analysisJournal of Computational and Applied Mathematics, 1987
- Estimating the Dimension of a ModelThe Annals of Statistics, 1978