Reassessing Google Flu Trends Data for Detection of Seasonal and Pandemic Influenza: A Comparative Epidemiological Study at Three Geographic Scales

Top Cited Papers

Open Access

17 October 2013

journal article
research article
Published by Public Library of Science (PLoS) in PLoS Computational Biology

Vol. 9 (10) , e1003256
https://doi.org/10.1371/journal.pcbi.1003256

Abstract

The goal of influenza-like illness (ILI) surveillance is to determine the timing, location and magnitude of outbreaks by monitoring the frequency and progression of clinical case incidence. Advances in computational and information technology have allowed for automated collection of higher volumes of electronic data and more timely analyses than previously possible. Novel surveillance systems, including those based on internet search query data like Google Flu Trends (GFT), are being used as surrogates for clinically-based reporting of influenza-like-illness (ILI). We investigated the reliability of GFT during the last decade (2003 to 2013), and compared weekly public health surveillance with search query data to characterize the timing and intensity of seasonal and pandemic influenza at the national (United States), regional (Mid-Atlantic) and local (New York City) levels. We identified substantial flaws in the original and updated GFT models at all three geographic scales, including completely missing the first wave of the 2009 influenza A/H1N1 pandemic, and greatly overestimating the intensity of the A/H3N2 epidemic during the 2012/2013 season. These results were obtained for both the original (2008) and the updated (2009) GFT algorithms. The performance of both models was problematic, perhaps because of changes in internet search behavior and differences in the seasonality, geographical heterogeneity and age-distribution of the epidemics between the periods of GFT model-fitting and prospective use. We conclude that GFT data may not provide reliable surveillance for seasonal or pandemic influenza and should be interpreted with caution until the algorithm can be improved and evaluated. Current internet search query data are no substitute for timely local clinical and laboratory surveillance, or national surveillance based on local data collection. New generation surveillance systems such as GFT should incorporate the use of near-real time electronic health data and computational methods for continued model-fitting and ongoing evaluation and improvement. In November 2008, Google Flu Trends was launched as an open tool for influenza surveillance in the United States. Engineered as a system for early detection and daily monitoring of the intensity of seasonal influenza epidemics, Google Flu Trends uses internet search data and a proprietary algorithm to provide a surrogate measure of influenza-like illness in the population. During its first season of operation, the novel A/H1N1-pdm influenza virus emerged, heterogeneously causing sporadic outbreaks in the spring and summer of 2009 across many parts of the United States. During the autumn 2009 pandemic wave, Google updated their model with a new algorithm and case definition; the updated model has run prospectively since. Our study asks whether Google Flu Trends provides accurate detection and monitoring of influenza at the national, regional and local geographic scales. Reliable local surveillance is important to reduce uncertainty and improve situational awareness during seasonal epidemics and pandemics. We found substantial flaws with the original and updated Google Flu Trends models, including missing the emergence of the 2009 pandemic and overestimating the 2012/2013 influenza season epidemic. Our work supports the development of local near-real time computerized syndromic surveillance systems, and collaborative regional, national and international networks.

Keywords

This publication has 52 references indexed in Scilit:

Forecasting seasonal outbreaks of influenza
Proceedings of the National Academy of Sciences, 2012
Applying a New Model for Sharing Population Health Data to National Syndromic Influenza Surveillance: DiSTRIBuTE Project Proof of Concept, 2006 to 2009
PLoS Currents, 2011
Using an online survey of healthcare-seeking behaviour to estimate the magnitude and severity of the 2009 H1N1v influenza epidemic in England
BMC Infectious Diseases, 2011
Severe Respiratory Disease Concurrent with the Circulation of H1N1 Influenza
New England Journal of Medicine, 2009
Pandemic Potential of a Strain of Influenza A (H1N1): Early Findings
Science, 2009
The Signature Features of Influenza Pandemics — Implications for Policy
New England Journal of Medicine, 2009
Detecting influenza epidemics using search engine query data
Nature, 2009
Lessons from 40 years' surveillance of influenza in England and Wales
Epidemiology and Infection, 2007
Online detection and quantification of epidemics
BMC Medical Informatics and Decision Making, 2007
Potential for early warning of viral influenza activity in the community by monitoring clinical diagnoses of influenza in hospital emergency departments
BMC Public Health, 2007