Sampling Reproducibility and Error Estimation in near Infrared Calibration of Lake Sediments for Water Quality Monitoring

Abstract
This study forms part of a wider project designed to develop methods for routine lake monitoring using near infrared (NIR) spectrometry of surface sediment samples. During calibration, linear relationships (y = Xb + f) between water chemistry variables (y) and the NIR spectra (X) were evaluated by regression analysis. The principal objectives of this study were to investigate sources of error, both in the X-data (i.e. the NIR spectra), due to natural variation, sediment sampling, subsequent sample handling and measurements and also in the estimation of y-data, here measured lake-water pH values for use in calibration. The error in the NIR spectral data was investigated in two different ways. First, lake-water pH was predicted by a PLS model derived from triplicate lake sediment spectra, and an ANOVA was carried out on the predicted pH. Using this strategy, the within-lake variance of NIR-predicted pH of each lake was found to be significantly lower than the between-lake variance at the p = 0.01 confidence level. In an alternative approach, lakes which were very similar, according to principal component analysis (PCA) score plots, were selected and PLS-DA (Partial Least Squares-Discriminant Analysis) was used to show that the triplicate sediment spectra from each lake were clearly resolved from spectra of other lakes. For 33 lakes, pH measurements of their waters allowed estimation of an arithmetic mean and variance in the y-data. This variance was pooled over all the lakes and compared to the total variance in the y-variable. For pH, the temporal within-lake variability, pooled over all lakes, accounted for only 1.7% of the between-lake variability. Thus, the sampling strategy and temporal resolution of measured lake-water pH allow accurate estimates of lake-water pH from NIR spectra.