Self organization of a massive document collection
Top Cited Papers
- 1 May 2000
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Neural Networks
- Vol. 11 (3) , 574-585
- https://doi.org/10.1109/72.846729
Abstract
Describes the implementation of a system that is able to organize vast document collections according to textual similarities. It is based on the self-organizing map (SOM) algorithm. As the feature vectors for the documents statistical representations of their vocabularies are used. The main goal in our work has been to scale up the SOM algorithm to be able to deal with large amounts of high-dimensional data. In a practical experiment we mapped 6840568 patent abstracts onto a 1002240-node SOM. As the feature vectors we used 500-dimensional vectors of stochastic figures obtained as random projections of weighted word histograms.Keywords
This publication has 33 references indexed in Scilit:
- Text classification with self-organizing maps: Some lessons learnedNeurocomputing, 1998
- Information visualization for collaborative computingComputer, 1998
- Self-Organizing Maps of Very Large Document Collections: Justification for the WEBSOM MethodPublished by Springer Nature ,1998
- Map displays for information retrievalJournal of the American Society for Information Science, 1997
- Internet Categorization and Search: A Self-Organizing ApproachJournal of Visual Communication and Image Representation, 1996
- Neural networks and information extraction in astronomical information retrievalVistas in Astronomy, 1996
- Indexing by latent semantic analysisJournal of the American Society for Information Science, 1990
- Self-organizing semantic mapsBiological Cybernetics, 1989
- Vector quantization in speech codingProceedings of the IEEE, 1985
- Vector quantizationIEEE ASSP Magazine, 1984