Stepwise canonical discriminant analysis of continuous digitalized signals: Application to chromatograms of wheat proteins

Abstract
Continuous digitalized signals such as spectra, electrophoregrams or chromatograms generally have a large number of data points and contain redundant information. It is therefore troublesome performing discriminant analysis without any preliminary selection of variables. A procedure for the application of canonical discriminant analysis (CDA) on this kind of data is studied. CDA can be presented as a succession of two principal component analyses (PCAs). The first is performed directly on the raw data and gives PC scores. The second is applied on the gravity centres of each qualitative group assessed on the normalized PC scores. A stepwise procedure for selection of the relevant PC scores is presented. The method has been tested on an illustrative collection of 165 size‐exclusion high‐performance (SE‐HPLC) chromatograms of proteins of wheat belonging to 55 genotypes and grown in three locations. The discrimination of the growing locations was performed using seven to nine PC scores and gave more than 86% accurate classifications of the samples both in the training sets and the verification sets. The genotypes were also rather well identified, with more than 85% of the samples correctly classified. The studied method gives a way of assessing relevant mathematical distances between digitalized signals according to qualitative knowledge of the samples.