Diatoms and pH reconstruction

Abstract
Palaeolimnological diatom data comprise counts of many species expressed as percentages for each sample. Reconstruction of past lake-water pH from such data involves two steps; (i) regression, where responses of modern diatom abundances to pH are modelled and (ii) calibration where the modelled responses are used to infer pH from diatom assemblages preserved in lake sediments. In view of the highly multivariate nature of diatom data, the strongly nonlinear response of diatoms to pH, and the abundance of zero values in the data, a compromise between ecological realism and computational feasability is essential. The two numerical approaches used are (i) the computationally demanding but formal statistical approach of maximum likelihood (m l) Gaussian logit regression and calibration and (ii) the computationally straightforward but heuristic approach of weighted averaging (w a ) regression and calibration. When the Surface Water Acidification Project (SWAP) modern training set of 178 lakes is reduced by data-screening to 167 lakes, w a gives superior results in terms of lowest root mean squared errors of prediction in cross-validation. Bootstrapping is also used to derive prediction errors, not only for the training set as a whole but also for individual pH reconstructions by WA for stratigraphic samples from Round Loch of Glenhead, southwest Scotland covering the last 10000 years. These reconstructions are evaluated in terms of lack-of-fit to pH and analogue measures and are interpreted in terms of rate of change by using bootstrapping of the reconstructed pH time-series.