Performance of Some Resistant Rules for Outlier Labeling

Abstract
The techniques of exploratory data analysis include a resistant rule for identifying possible outliers in univariate data. Using the lower and upper fourths, FL and FU (approximate quartiles), it labels as “outside” any observations below FL − 1.5(FU — FL ) or above FU + 1.5(FU — FL ). For example, in the ordered sample −5, −2, 0, 1, 8, FL = −2 and FU = 1, so any observation below −6.5 or above 5.5 is outside. Thus the rule labels 8 as outside. Some related rules also use cutoffs of the form FL — k(FU — FL ) and FU + k(FU — FL ). This approach avoids the need to specify the number of possible outliers in advance; as long as they are not too numerous, any outliers do not affect the location of the cutoffs. To describe the performance of these rules, we define the some-outside rate per sample as the probability that a sample will contain one or more outside observations. Its complement is the all-inside rate per sample. We also define the outside rate per observation as the average fraction of outs...

This publication has 0 references indexed in Scilit: