Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach
Top Cited Papers
Open Access
- 25 September 2013
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLOS ONE
- Vol. 8 (9) , e73791
- https://doi.org/10.1371/journal.pone.0073791
Abstract
We analyzed 700 million words, phrases, and topic instances collected from the Facebook messages of 75,000 volunteers, who also took standard personality tests, and found striking variations in language with personality, gender, and age. In our open-vocabulary technique, the data itself drives a comprehensive exploration of language that distinguishes people, finding connections that are not captured with traditional closed-vocabulary word-category analyses. Our analyses shed new light on psychosocial processes yielding results that are face valid (e.g., subjects living in high elevations talk about the mountains), tie in with other research (e.g., neurotic people disproportionately use the phrase ‘sick of’ and the word ‘depressed’), suggest new hypotheses (e.g., an active life implies emotional stability), and give detailed insights (males use the possessive ‘my’ when mentioning their ‘wife’ or ‘girlfriend’ more often than females use ‘my’ with ‘husband’ or 'boyfriend’). To date, this represents the largest study, by an order of magnitude, of language and personality.Keywords
This publication has 74 references indexed in Scilit:
- Private traits and attributes are predictable from digital records of human behaviorProceedings of the National Academy of Sciences, 2013
- Temporal Patterns of Happiness and Information in a Global Social Network: Hedonometrics and TwitterPLOS ONE, 2011
- Quantitative Analysis of Culture Using Millions of Digitized BooksScience, 2011
- Personality in 100,000 Words: A large-scale analysis of personality and word use among bloggersJournal of Research in Personality, 2010
- Detecting influenza epidemics using search engine query dataNature, 2009
- Computational Social ScienceScience, 2009
- Revealing dimensions of thinking in open-ended self-descriptions: An automated meaning extraction method for natural languagePublished by Elsevier ,2007
- Robust Locally Weighted Regression and Smoothing ScatterplotsJournal of the American Statistical Association, 1979
- Ridge Regression: Biased Estimation for Nonorthogonal ProblemsTechnometrics, 1970
- Multiple Comparisons among MeansJournal of the American Statistical Association, 1961