Quasi-Equifrequent Group Generation and Evaluation

Abstract
The frequency of occurrence and other statistical results derived thereupon from unique items in collections such as letters, words and records has recently formed the basis for the design of optimal information structures. A fundamental theorem of information science states that the information representing capability of a set of symbols is maximized when the probability of occurrence of any symbol in the available set becomes the same. Equifrequency however is very rarely encountered in real applications and it is in many cases desirable to have sets of items or symbols which are equifrequent within a certain deviation i.e. quasi-equifrequent. This paper presents an algorithm for generating equifrequent sets and evaluates and compares the efficiency and accuracy of (a) the entropy and (b) the variance concepts for measuring the degree of quasi-equifrequency in a set. Tests are carried out on the occurrence of the letters A-Z (out of a total of 7,908,100 letters) on a 244 unique subfields (out of a total of 1,113,447 bibliographic record subfields) and an absolutely equifrequent set of subfields is presented.

This publication has 0 references indexed in Scilit: