A Method of Automated Nonparametric Content Analysis for Social Science
Top Cited Papers
- 28 December 2009
- journal article
- research article
- Published by Wiley in American Journal of Political Science
- Vol. 54 (1) , 229-247
- https://doi.org/10.1111/j.1540-5907.2009.00428.x
Abstract
The increasing availability of digitized text presents enormous opportunities for social scientists. Yet hand coding many blogs, speeches, government records, newspapers, or other sources of unstructured text is infeasible. Although computer scientists have methods for automated content analysis, most are optimized to classify individual documents, whereas social scientists instead want generalizations about the population of documents, such as the proportion in a given category. Unfortunately, even a method with a high percent of individual documents correctly classified can be hugely biased when estimating category proportions. By directly optimizing for this social science goal, we develop a method that gives approximately unbiased estimates of category proportions even when the optimal classifier performs poorly. We illustrate with diverse data sets, including the daily expressed opinions of thousands of people about the U.S. presidency. We also make available software that implements our methods and large corpora of text for further analysis.This publication has 36 references indexed in Scilit:
- How to Analyze Political Attention with Minimal Assumptions and CostsAmerican Journal of Political Science, 2009
- Verbal Autopsy Methods with Multiple Causes of DeathStatistical Science, 2008
- An algorithm for suffix strippingProgram: electronic library and information systems, 2006
- The Dangers of Extreme CounterfactualsPolitical Analysis, 2006
- Dimensional Reduction of Word-Frequency Data as a Substitute for Intersubjective Content AnalysisPolitical Analysis, 2004
- Estimating risk and rate levels, ratios and differences in case‐control studiesStatistics in Medicine, 2002
- Yahoo! for Amazon: Sentiment Parsing from Small Talk on the WebSSRN Electronic Journal, 2001
- Simulation-Extrapolation Estimation in Parametric Measurement Error ModelsJournal of the American Statistical Association, 1994
- Machine Coding of Event Data Using Regional and International SourcesInternational Studies Quarterly, 1994
- Detecting Collaboration in PropagandaPublic Opinion Quarterly, 1947