PubMed related articles: a probabilistic topic-based model for content similarity
Top Cited Papers
Open Access
- 30 October 2007
- journal article
- Published by Springer Nature in BMC Bioinformatics
- Vol. 8 (1) , 423
- https://doi.org/10.1186/1471-2105-8-423
Abstract
We present a probabilistic topic-based model for content similarity called pmra that underlies the related article search feature in PubMed. Whether or not a document is about a particular topic is computed from term frequencies, modeled as Poisson distributions. Unlike previous probabilistic retrieval models, we do not attempt to estimate relevance–but rather our focus is "relatedness", the probability that a user would want to examine a particular document given known interest in another. We also describe a novel technique for estimating parameters that does not require human relevance judgments; instead, the process is based on the existence of MeSH ® in MEDLINE ®.Keywords
This publication has 12 references indexed in Scilit:
- Find-similarPublished by Association for Computing Machinery (ACM) ,2006
- Modeling Text Retrieval in BiomedicinePublished by Springer Nature ,2006
- A Markov random field model for term dependenciesPublished by Association for Computing Machinery (ACM) ,2005
- A probabilistic model of information retrieval: development and comparative experimentsInformation Processing & Management, 2000
- Pivoted document length normalizationPublished by Association for Computing Machinery (ACM) ,1996
- The effectiveness of document neighboring in search enhancementInformation Processing & Management, 1994
- Modelling documents with multiple poisson distributionsInformation Processing & Management, 1993
- THE PROBABILITY RANKING PRINCIPLE IN IRJournal of Documentation, 1977
- A vector space model for automatic indexingCommunications of the ACM, 1975
- A probabilistic approach to automatic keyword indexing. Part I. On the Distribution of Specialty Words in a Technical LiteratureJournal of the American Society for Information Science, 1975