A Comparative Evaluation of Full-text, Concept-based, and Context-sensitive Search

Open Access

1 March 2007

journal article
research article
Published by Oxford University Press (OUP) in Journal of the American Medical Informatics Association

Vol. 14 (2) , 164-174
https://doi.org/10.1197/jamia.M1953

Abstract

Objectives: Study comparatively (1) concept-based search, using documents pre-indexed by a conceptual hierarchy; (2) context-sensitive search, using structured, labeled documents; and (3) traditional full-text search. Hypotheses were: (1) more contexts lead to better retrieval accuracy; and (2) adding concept-based search to the other searches would improve upon their baseline performances. Design: Use our Vaidurya architecture, for search and retrieval evaluation, of structured documents classified by a conceptual hierarchy, on a clinical guidelines test collection. Measurements: Precision computed at different levels of recall to assess the contribution of the retrieval methods. Comparisons of precisions done with recall set at 0.5, using t-tests. Results: Performance increased monotonically with the number of query context elements. Adding context-sensitive elements, mean improvement was 11.1% at recall 0.5. With three contexts, mean query precision was 42% ± 17% (95% confidence interval [CI], 31% to 53%); with two contexts, 32% ± 13% (95% CI, 27% to 38%); and one context, 20% ± 9% (95% CI, 15% to 24%). Adding context-based queries to full-text queries monotonically improved precision beyond the 0.4 level of recall. Mean improvement was 4.5% at recall 0.5. Adding concept-based search to full-text search improved precision to 19.4% at recall 0.5. Conclusions: The study demonstrated usefulness of concept-based and context-sensitive queries for enhancing the precision of retrieval from a digital library of semi-structured clinical guideline documents. Concept-based searches outperformed free-text queries, especially when baseline precision was low. In general, the more ontological elements used in the query, the greater the resulting precision.

Keywords

This publication has 12 references indexed in Scilit:

A framework for a distributed, hybrid, multiple-ontology clinical-guideline library, and automated guideline-support tools
Journal of Biomedical Informatics, 2004
Vaidurya--a concept-based, context-sensitive search engine for clinical guidelines.
2004
Sharable Representation of Clinical Guidelines in GLIF: Relationship to the Arden Syntax
Journal of Biomedical Informatics, 2001
GEM: A Proposal for a More Comprehensive Guideline Document Model Using XML
Journal of the American Medical Informatics Association, 2000
Assessing thesaurus-based query expansion using the UMLS Metathesaurus.
2000
Development and evaluation of a context-based document representation for searching the medical literature
International Journal on Digital Libraries, 1997
A Comparison of Two Methods for Indexing and Retrieval from a Full-text Medical Database
Medical Decision Making, 1993
A Comparison of Retrieval Effectiveness for Three Methods of Indexing Medical Literature
The Lancet Healthy Longevity, 1992
SAPHIRE—An information retrieval system featuring concept matching, automatic indexing, probabilistic retrieval, and hierarchical relationships
Computers and Biomedical Research, 1990
An algorithm for suffix stripping
Program: electronic library and information systems, 1980