OPTIMAL DEFINITION OF CLASS INTERVALS FOR FREQUENCY TABLES

1 July 1983

journal article
research article
Published by Taylor & Francis in Particulate Science and Technology

Vol. 1 (3) , 281-293
https://doi.org/10.1080/02726358308906373

Abstract

Data sets are often analyzed in the form of collections of frequency tables (or percentiles derived from equivalent cumulative frequency distributions). Decisions concerning the number of intervals and interval width obviously affect the quality of the data in subsequent analysis. Relying on the basic concepts of information theory, a procedure is presented which evaluates the relative information content of a set of frequency data when subdivided in various manners. Maximum information is always preserved when “maximum entropy” histograms (with unequal class intervals) are used. Evaluation of several schemes of frequency table subdivision (phi-based arithmetic, log arithmetic, Z-score, log Z-score, maximum entropy) indicates that, surprisingly, collections of equal interval phi-based frequency tables contain the least information. Additionally, the concept of the relative entropy of a given collection of frequency tables is defined. The relative entropy is useful as a feature extractor wherein several collections of data with potentially similar information can be compared. An example of using the relative entropy as a feature extractor is given in shape analysis where the choice of which harmonic(s) represents the greatest shape differences need to be defined.

Keywords

This publication has 9 references indexed in Scilit:

Some approaches for location of centroids of quartz grain outlines to increase homology between Fourier amplitude spectra
Mathematical Geology, 1982
Discrimination of depositional environments using settling tube data
Journal of Sedimentary Research, 1979
Extended cabfac and Qmodel computer programs for Q-mode factor analysis of compositional data
Computers & Geosciences, 1976
Atmospherically Transported Volcanic Glass in Deep-Sea Sediments: Volcanism in Sub-Antarctic Latitudes of the South Pacific During Late Pliocene and Pleistocene Time
GSA Bulletin, 1975
An algorithm andFortran-iv program for large-scaleQ-mode factor analysis and calculation of factor scores
Mathematical Geology, 1971
A REVIEW OF GRAIN‐SIZE PARAMETERS
Sedimentology, 1966
A Sorting Index
The Journal of Geology, 1963
Brazos River bar [Texas]; a study in the significance of grain size parameters
Journal of Sedimentary Research, 1957
Relation of surface angle distribution to particle size distribution on alluvial fans [Arizona]
Journal of Sedimentary Research, 1952