Genetic Algorithm-Based Protocol for Coupling Digital Filtering and Partial Least-Squares Regression: Application to the Near-Infrared Analysis of Glucose in Biological Matrices

Abstract
A multivariate calibration procedure is described that is based on the use of a genetic algorithm (GA) to guide the coupling of bandpass digital filtering and partial least-squares (PLS) regression. The measurement of glucose in three different biological matrices with near-infrared spectroscopy is employed to develop this protocol. The GA is employed to optimize the position and width of the bandpass digital filter, the spectral range for PLS regression, and the number of PLS factors used in building the calibration model. The optimization of these variables is difficult because the values of the variables employ different units, resulting in a tendency for local optima to occur on the response surface of the optimization. Two issues are found to be critical to the success of the optimization: the configuration of the GA and the development of an appropriate fitness function. An integer representation for the GA is employed to overcome the difficulty in optimizing variables that are dissimilar, and the optimal GA configuration is found through experimental design methods. Three fitness function calculations are compared for their ability to lead the GA to better calibration models. A fitness function based on the combination of the mean-squared error in the calibration set data, the mean-squared error in the monitoring set data, and the number of PLS factors raised to a weighting factor is found to perform best. Multiple random drawings of the calibration and monitoring sets are also found to improve the optimization performance. Using this fitness function and three random drawings of the calibration and monitoring sets, the GA found calibration models that required fewer PLS factors yet had similar or better prediction abilities compared to calibration models found through an optimization protocol based on a grid search method.