Frequency of occurrence of two- and three-word sequences in English

Abstract
We analyzed a large corpus of written English to determine the most common two-word and three-word sequences and their frequencies. We disallowed sequences with internal punctuation in an attempt to limit the search to sequences likely to be pronounced as a unit. There are obvious applications of the results to word-based speech synthesis; we feel our frequency tables will be useful to linguists, psychologists, and workers in automatic speech recognition as well.