Experiments with a component theory of probabilistic information retrieval based on single terms as document components
- 1 October 1990
- journal article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Information Systems
- Vol. 8 (4) , 363-386
- https://doi.org/10.1145/102675.102677
Abstract
A component theory of information retrieval using single content terms as component for queries and documents was reviewed and experimented with. The theory has the advantages of being able to (1) bootstrap itself, that is, define initial term weights naturally based on the fact that items are self relevent; (2) make use of within-item term frequencies; (3) account for query-focused and document-focused indexing and retrieval strategies cooperatively; and (4) allow for component-specific feedback if such information is available. Retrieval results with four collections support the effectiveness of all the first three aspects, except for predictive retrieval. At the initial indexing stage, the retrieval theory performed much more consistantly across collections than croft's model and provided results comparable to Salton's tf*idf approach. An inverse collection term frequency (ICTF) formula was also tested that performed much better than the inverse document frequency (IDF). With full feedback retrospective retrieval, the component theory performed substantially better than Croft's, because of the highly specific nature of document-focused feedback. Repetitive retireval results with partial relevance feedback mirrored those for the retrospective. However, for the important case of predictive retrieval using residual ranking, results were not unequivocal.Keywords
This publication has 35 references indexed in Scilit:
- A framework for effective retrievalACM Transactions on Database Systems, 1989
- A probabilistic theory of indexing and similarity measure based on cited and citing documentsJournal of the American Society for Information Science, 1985
- On the Construction of Feedback QueriesJournal of the ACM, 1982
- Operations Research Applied to Document Indexing and Retrieval DecisionsJournal of the ACM, 1977
- THE PROBABILITY RANKING PRINCIPLE IN IRJournal of Documentation, 1977
- A THEORETICAL BASIS FOR THE USE OF CO‐OCCURRENCE DATA IN INFORMATION RETRIEVALJournal of Documentation, 1977
- Precision Weighting—An Effective Automatic Indexing MethodJournal of the ACM, 1976
- Relevance, pertinence and information system developmentInformation Storage and Retrieval, 1974
- A STATISTICAL INTERPRETATION OF TERM SPECIFICITY AND ITS APPLICATION IN RETRIEVALJournal of Documentation, 1972
- On Relevance, Probabilistic Indexing and Information RetrievalJournal of the ACM, 1960