A Zipfian Model of an Automatic Bibliographic System: An Application to MEDLINE
- 1 July 1982
- journal article
- research article
- Published by Wiley in Journal of the American Society for Information Science
- Vol. 33 (4) , 223-232
- https://doi.org/10.1002/asi.4630330406
Abstract
A Zipfian model of an automatic bibliographic system is developed using parameters describing the contents of Its database and its inverted file. The underlying structure of the Zipf distribution is derived, with particular emphasis on its application to work frequencies, especlaiiy with regard to the inverted files of an automatic bibliographic system. Andrew Booth developed a form of Zipf's law which estimates the number of words of a particular frequency for a given author and text. His formulation has been adopted as the basis of a model of term dispersion in an inverted file system. The model is also distinctive in its consideration of the proliferation of spelling errors in free text, and the inclusion of all searchable elements from the system's Inverted file. This model is applied to the National Library of Medicine's MEDLINE. The model carries implications for the determination of database storage requirements, search response time, and search exhaustiveness.Keywords
This publication has 7 references indexed in Scilit:
- Automatic detection and correction of spelling errors in a large data baseJournal of the American Society for Information Science, 1980
- Frequency and impact of spelling errors in bibliographic data basesInformation Processing & Management, 1977
- Indexing consistency and qualityAmerican Documentation, 1969
- A “Law≓ of occurrences for words of low frequencyInformation and Control, 1967
- Distribution of indexing terms for maximum efficiency of information transmissionAmerican Documentation, 1967
- The distribution of term usage in manipulative indexesAmerican Documentation, 1964
- Little Science, Big SciencePublished by Columbia University Press ,1963