Outlier Detection in Multivariate Analytical Chemical Data

Abstract
The unreliability of multivariate outlier detection techniques such as Mahalanobis distance and hat matrix leverage has been known in the statistical community for well over a decade. However, only within the past few years has a serious effort been made to introduce robust methods for the detection of multivariate outliers into the chemical literature. Techniques such as the minimum volume ellipsoid (MVE), multivariate trimming (MVT), and M-estimators (e.g., PROP), and others similar to them, such as the minimum covariance determinant (MCD), rely upon algorithms that are difficult to program and may require significant processing times. While MCD and MVE have been shown to be statistically sound, we found MVT unreliable due to the method's use of the Mahalanobis distance measure in its initial step. We examined the performance of MCD and MVT on selected data sets and in simulations and compared the results with two methods of our own devising. Both the proposed resampling by the half-means method and the smallest half-volume method are simple to use, are conceptually clear, and provide results superior to MVT and the current best-performing technique, MCD. Either proposed method is recommended for the detection of multiple outliers in multivariate data.