Dictionary buildup and stability of word frequency in a specialized medical area
- 1 January 1967
- journal article
- research article
- Published by Wiley in American Documentation
- Vol. 18 (1) , 21-25
- https://doi.org/10.1002/asi.5090180105
Abstract
This is a report of word usage in radiological (x‐ray) patient records as found in a 5% sample of the annual case load at UAMC including 100,000 words. Records were taken exactly as dictated. The study is part of an effort to develop an IR system for patient data. The system “autocodes” (automatically stores) the physician's dictated findings and diagnoses in such a fashion that they can be retrieved again automatically.Some of our findings approximate results reported in the literature. For example, the rate of introduction of new different words levels off to about 2,500 words when 40,000 to 50,000 words of text have been analyzed. However, unclassified words continue to occur at a significant level of almost 2% at the 100,000 word level, with a 1% noise level.Attempts to establish the rank order of words beyond the first several hundred have failed because about 70% of the words appear to occur with such a low relative frequency (no more than one time in 10,000). Thus, establishing files by rank order appears impractical, even though filter lists (discard words) by rank groups (words with nearly the same relative frequency) are quite practical.Additional data are presented and design implications are discussed.Keywords
This publication has 3 references indexed in Scilit:
- COMPUTER AUTOCODING, SELECTING AND CORRELATING OF RADIOLOGIC DIAGNOSTIC CASESAmerican Journal of Roentgenology, 1966
- A Dictionary for Minimum Redundancy EncodingJournal of the ACM, 1963
- Relativ Frequency of English Speech SoundsPublished by Harvard University Press ,1923