Computer-Assisted Topic Classification for Mixed-Methods Social Science Research
- 15 May 2008
- journal article
- research article
- Published by Taylor & Francis in Journal of Information Technology & Politics
- Vol. 4 (4) , 31-46
- https://doi.org/10.1080/19331680801975367
Abstract
Social scientists interested in mixed-methods research have traditionally turned to human annotators to classify the documents or events used in their analyses. The rapid growth of digitized government documents in recent years presents new opportunities for research but also new challenges. With more and more data coming online, relying on human annotators becomes prohibitively expensive for many tasks. For researchers interested in saving time and money while maintaining confidence in their results, we show how a particular supervised learning system can provide estimates of the class of each document (or event). This system maintains high classification accuracy and provides accurate estimates of document proportions, while achieving reliability levels associated with human efforts. We estimate that it lowers the costs of classifying large numbers of complex documents by 80% or more.Keywords
This publication has 2 references indexed in Scilit:
- Classifier Technology and the Illusion of ProgressStatistical Science, 2006
- Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author)Statistical Science, 2001