Advances in nowcasting influenza-like illness rates using search query logs
Open Access
- 3 August 2015
- journal article
- research article
- Published by Springer Nature in Scientific Reports
- Vol. 5 (1) , 12760
- https://doi.org/10.1038/srep12760
Abstract
User-generated content can assist epidemiological surveillance in the early detection and prevalence estimation of infectious diseases, such as influenza. Google Flu Trends embodies the first public platform for transforming search queries to indications about the current state of flu in various places all over the world. However, the original model significantly mispredicted influenza-like illness rates in the US during the 2012–13 flu season. In this work, we build on the previous modeling attempt, proposing substantial improvements. Firstly, we investigate the performance of a widely used linear regularized regression solver, known as the Elastic Net. Then, we expand on this model by incorporating the queries selected by the Elastic Net into a nonlinear regression framework, based on a composite Gaussian Process. Finally, we augment the query-only predictions with an autoregressive model, injecting prior knowledge about the disease. We assess predictive performance using five consecutive flu seasons spanning from 2008 to 2013 and qualitatively explain certain shortcomings of the previous approach. Our results indicate that a nonlinear query modeling approach delivers the lowest cumulative nowcasting error, and also suggest that query information significantly improves autoregressive inferences, obtaining state-of-the-art performance.This publication has 39 references indexed in Scilit:
- Reassessing Google Flu Trends Data for Detection of Seasonal and Pandemic Influenza: A Comparative Epidemiological Study at Three Geographic ScalesPLoS Computational Biology, 2013
- Nowcasting Events from the Social Web with Statistical LearningACM Transactions on Intelligent Systems and Technology, 2012
- Lightweight methods to estimate influenza rates and alcohol sales volume from Twitter messagesLanguage Resources and Evaluation, 2012
- Assessing Google Flu Trends Performance in the United States during the 2009 Influenza Virus A (H1N1) PandemicPLOS ONE, 2011
- Predicting consumer behavior with Web searchProceedings of the National Academy of Sciences, 2010
- Identification of influential spreaders in complex networksNature Physics, 2010
- Infodemiology and Infoveillance: Framework for an Emerging Set of Public Health Informatics Methods to Analyze Search, Communication and Publication Behavior on the InternetJournal of Medical Internet Research, 2009
- Detecting influenza epidemics using search engine query dataNature, 2009
- Regularization and Variable Selection Via the Elastic NetJournal of the Royal Statistical Society Series B: Statistical Methodology, 2005
- Least squares quantization in PCMIEEE Transactions on Information Theory, 1982