Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English
Top Cited Papers
- 1 November 2009
- journal article
- Published by Springer Nature in Behavior Research Methods
- Vol. 41 (4) , 977-990
- https://doi.org/10.3758/brm.41.4.977
Abstract
Word frequency is the most important variable in research on word processing and memory. Yet, the main criterion for selecting word frequency norms has been the availability of the measure, rather than its quality. As a result, much research is still based on the old Kučera and Francis frequency norms. By using the lexical decision times of recently published megastudies, we show how bad this measure is and what must be done to improve it. In particular, we investigated the size of the corpus, the language register on which the corpus is based, and the definition of the frequency measure. We observed that corpus size is of practical importance for small sizes (depending on the frequency of the word), but not for sizes above 16–30 million words. As for the language register, we found that frequencies based on television and film subtitles are better than frequencies based on written sources, certainly for the monosyllabic and bisyllabic words used in psycholinguistic research. Finally, we found that lemma frequencies are not superior to word form frequencies in English and that a measure of contextual diversity is better than a measure based on raw frequency of occurrence. Part of the superiority of the latter is due to the words that are frequently used as names. Assembling a new frequency norm on the basis of these considerations turned out to predict word processing times much better than did the existing norms (including Kučera & Francis and Celex). The new SUBTL frequency norms from the SUBTLEXUS corpus are freely available for research purposes from http://brm.psychonomic-journals.org/content/supplemental, as well as from the University of Ghent and Lexique Web sites.Keywords
This publication has 47 references indexed in Scilit:
- Autobiographical elaboration reduces memory distortion: Cognitive operations and the distinctiveness heuristic.Journal of Experimental Psychology: Learning, Memory, and Cognition, 2008
- Pictures of a thousand words: Investigating the neural mechanisms of reading with extremely rapid event-related fMRINeuroImage, 2008
- The word grouping hypothesis and eye movements during reading.Journal of Experimental Psychology: Learning, Memory, and Cognition, 2008
- The use of film subtitles to estimate word frequenciesApplied Psycholinguistics, 2007
- The English Lexicon ProjectBehavior Research Methods, 2007
- Contextual Diversity, Not Word Frequency, Determines Word-Naming and Lexical Decision TimesPsychological Science, 2006
- The processing of singular and plural nouns in French and EnglishJournal of Memory and Language, 2004
- Disentangling Context Availability and Concreteness in Lexical Decision and Word TranslationThe Quarterly Journal of Experimental Psychology Section A, 1998
- The mirror effect in recognition memoryMemory & Cognition, 1985
- Visual duration threshold as a function of word-probability.Journal of Experimental Psychology, 1951