Rank Order Distributions and Secondary Key Indexing

Abstract
The performance of a secondary index depends greatly upon the distribution of secondary key values, especially when these are not unique. The nature of these distributions is discussed and a model for the minimum indexing time is proposed. Normally, at the time the database is designed, little is known about the nature of the data to be stored. A technique is described for modelling the underlying distribution of a secondary key population, based on a small sample from that population. Alternative indexing strategies may be compared on the basis of this model distribution at an early stage of design. Possible strategies for improving indexing performance are discussed.
Keywords

This publication has 0 references indexed in Scilit: