A probabilistic approach to spatiotemporal theme pattern mining on weblogs
- 23 May 2006
- proceedings article
- Published by Association for Computing Machinery (ACM)
- p. 533-542
- https://doi.org/10.1145/1135777.1135857
Abstract
Mining subtopics from weblogs and analyzing their spatiotemporal patterns have applications in multiple domains. In this paper, we define the novel problem of mining spatiotemporal theme patterns from weblogs and propose a novel probabilistic approach to model the subtopic themes and spatiotemporal theme patterns simultaneously. The proposed model discovers spatiotemporal theme patterns by (1) extracting common themes from weblogs; (2) generating theme life cycles for each given location; and (3) generating theme snapshots for each given time period. Evolution of patterns can be discovered by comparative analysis of theme life cycles and theme snapshots. Experiments on three different data sets show that the proposed approach can discover interesting spatiotemporal theme patterns effectively. The proposed probabilistic model is general and can be used for spatiotemporal text mining on any domain with time and location information.Keywords
This publication has 18 references indexed in Scilit:
- The predictive power of online chatterPublished by Association for Computing Machinery (ACM) ,2005
- Structure and evolution of blogspaceCommunications of the ACM, 2004
- A cross-collection mixture model for comparative text miningPublished by Association for Computing Machinery (ACM) ,2004
- Information diffusion through blogspacePublished by Association for Computing Machinery (ACM) ,2004
- NewsjunkiePublished by Association for Computing Machinery (ACM) ,2004
- Online novelty detection on temporal sequencesPublished by Association for Computing Machinery (ACM) ,2003
- On the bursty evolution of blogspacePublished by Association for Computing Machinery (ACM) ,2003
- Bursty and hierarchical structure in streamsPublished by Association for Computing Machinery (ACM) ,2002
- Machine learning of event segmentation for news on demandCommunications of the ACM, 2000
- Mining scientific dataCommunications of the ACM, 1996