The Remedian: A Robust Averaging Method for Large Data Sets

Abstract
It is often assumed that to compute a robust estimator on n data values one needs at least n storage elements (contrary to the sample average, that may be calculated with an updating mechanism). This is one of the main reasons why robust estimators are seldom used for large data sets and why they are not included in most statistical packages. We introduce a new estimator that takes up little storage space, investigate its statistical properties, and provide an example on real-time curve “averaging” in a medical context. The remedian with base b proceeds by computing medians of groups of b observations, and then medians of these medians, until only a single estimate remains. This method merely needs k arrays of size b (where n = bk ), so the total storage is O(log n) for fixed b or, alternatively, O(nl/k ) for fixed k. Its storage economy makes it useful for robust estimation in large data bases, for real-time engineering applications in which the data themselves are not stored, and for resistant “averaging” of curves or images. The method is equivariant for monotone transformations. Optimal choices of b with respect to storage and finite-sample breakdown are derived. The remedian is shown to be a consistent estimator of the population median, and it converges at a nonstandard rate to a median-stable distribution.

This publication has 0 references indexed in Scilit: