OPTIMAL DEFINITION OF CLASS INTERVALS FOR FREQUENCY TABLES
- 1 July 1983
- journal article
- research article
- Published by Taylor & Francis in Particulate Science and Technology
- Vol. 1 (3) , 281-293
- https://doi.org/10.1080/02726358308906373
Abstract
Data sets are often analyzed in the form of collections of frequency tables (or percentiles derived from equivalent cumulative frequency distributions). Decisions concerning the number of intervals and interval width obviously affect the quality of the data in subsequent analysis. Relying on the basic concepts of information theory, a procedure is presented which evaluates the relative information content of a set of frequency data when subdivided in various manners. Maximum information is always preserved when “maximum entropy” histograms (with unequal class intervals) are used. Evaluation of several schemes of frequency table subdivision (phi-based arithmetic, log arithmetic, Z-score, log Z-score, maximum entropy) indicates that, surprisingly, collections of equal interval phi-based frequency tables contain the least information. Additionally, the concept of the relative entropy of a given collection of frequency tables is defined. The relative entropy is useful as a feature extractor wherein several collections of data with potentially similar information can be compared. An example of using the relative entropy as a feature extractor is given in shape analysis where the choice of which harmonic(s) represents the greatest shape differences need to be defined.Keywords
This publication has 9 references indexed in Scilit:
- Some approaches for location of centroids of quartz grain outlines to increase homology between Fourier amplitude spectraMathematical Geology, 1982
- Discrimination of depositional environments using settling tube dataJournal of Sedimentary Research, 1979
- Extended cabfac and Qmodel computer programs for Q-mode factor analysis of compositional dataComputers & Geosciences, 1976
- Atmospherically Transported Volcanic Glass in Deep-Sea Sediments: Volcanism in Sub-Antarctic Latitudes of the South Pacific During Late Pliocene and Pleistocene TimeGSA Bulletin, 1975
- An algorithm andFortran-iv program for large-scaleQ-mode factor analysis and calculation of factor scoresMathematical Geology, 1971
- A REVIEW OF GRAIN‐SIZE PARAMETERSSedimentology, 1966
- A Sorting IndexThe Journal of Geology, 1963
- Brazos River bar [Texas]; a study in the significance of grain size parametersJournal of Sedimentary Research, 1957
- Relation of surface angle distribution to particle size distribution on alluvial fans [Arizona]Journal of Sedimentary Research, 1952