Latent semantic indexing is an optimal special case of multidimensional scaling

Abstract
Latent Semantic Indexing (LSI) is a technique for representing documents, queries, and terms as vectors in a multidimensional real-valued space. The representtions are approximations to the original term space encoding, and are found using the matrix technique of Singular Value Decomposition. In comparison Multidimensional Scaling (MDS) is a class of data analysis techniques for representing data points as points in a multidimensional real-valued space. The objects are represented so that inter-point similarities in the space match inter-object similarity information provided by the researcher. We illustrate how the document representations given by LSI are equivalent to the optimal representations found when solving a particular MDS problem in which the given inter-object similarity information is provided by the inner product similarities between the documents themselves. We further analyze a more general MDS problem in which the interdocument similarity information, although still in inner product form is arbitrary with respect to the vector space encoding of the documents.

This publication has 0 references indexed in Scilit: