Similarity measures for sets of strings †

Abstract
In the companion paper [3], we have presented a common basis for many of the similarity and dissimilarity measures involving a pair of strings. In this paper, we extend the results to capture various numerical and nonnumerical measures involving more than two strings. A measure D(X Y,…Z) has been defined involving the set of strings {X Y,…Z} in terms of two abstract operators ⊕ and ⊛ and a function δ(·, ·) which has as many arguments as there are strings in the set {X Y,…Z}. The quantity D(X Y,…Z) represents various numerical and nonnumerical quantities involving {X Y,…Z} such as Length of their Longest Common Subsequence, (LLCS) the Length of their Shortest Common Supersequence, (LSCS) the set of their common subsequences, the set of their common supersequences and the set of their shuffles. The computational properties of D(X Y,…Z) have also been discussed.

This publication has 2 references indexed in Scilit: