Generation of a pseudothesaurus for information retrieval based on cooccurrences and fuzzy set operations
- 1 January 1983
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Systems, Man, and Cybernetics
- Vol. SMC-13 (1) , 62-70
- https://doi.org/10.1109/tsmc.1983.6313030
Abstract
A thesaurus in bibliographic information retrieval is a list of technical terms with relations among them, enabling generic retrieval of documents having different but related keywords. Since the construction of a thesaurus is resource consuming an automatic generation method of a thesaurus-like structure is needed. A set-theoretical model of an abstract thesaurus is developed which is related to an automatic generation method based on cooccurrences of terms in the set of texts. Replacement of a basis set in the model and transformation of cooccurrence frequencies into fuzzy sets enables the transition from the abstract mathematical model to an actual procedure of automatic generation. The generated structure is called a pseudothesaurus. An algorithm to generate the pseudothesaurus from a large amount of data is developed. Moreover, two examples based on a dictionary of scientific usage and on an actual bibliographic database are given.Keywords
This publication has 0 references indexed in Scilit: