Displaying the Important Features of Large Collections of Similar Curves

Abstract
Naively displaying a large collection of curves by superimposing them one on another all on the same graph is largely uninformative and aesthetically unappealing. We propose that a simple principal component analysis be used to identify important modes of variation among the curves and that principal component scores be used to identify particular curves which clearly demonstrate the form and extent of that variation. As a result, we obtain a small number of figures on which are plotted a very few “representative” curves from the original collection; these successfully convey the major information present in sets of “similar” curves in a clear and attractive manner. Useful adjunct displays, including the plotting of principal component scores against covariates, are also described. Two examples—one concerning a data-based bandwidth selection procedure for kernel density estimation, the other involving ozone level curve data—illustrate the ideas.

This publication has 10 references indexed in Scilit: