Detecting aberrant strains in bacterial groups as an aid to constructing databases for computer identification

Abstract
Computer assisted identification systems require that databases on the test results of the species are of high quality. One reason for poor quality is the inadvertent inclusion of strains that do not belong to a taxon; this can readily occur in groups where ancillary criteria (e.g. serology) are not available. A possible strategy is to exclude strains that are very atypical in their properties, i.e. that are very outlying, provided an objective criterion can be used.A computer program, OUTLIER, for the detection of outlying strains in bacterial clusters was evaluated. A brief description of the theory and operation of the program is given. The program uses as an objective criterion the degree to which the strain data fits a chi‐square. This allows easy identification of aberrant strains that should be excluded in constructing a database.The program utilizes 1,0 data and calculations are based upon a choice of one of four identification coefficients. The relative merits of these four coefficients were examined for eight sets of bacterial data. Two of the coefficients, ‐log10 Willcox likelihood and Taxonomic distance squared appear to show little significant differences and we recommend these for routine work, with the first being the more useful. The Pattern distance squared was useful in indicating where atypical strains may be metabolically less active or slow‐growing members of a cluster rather than true outliers. The Variance‐weighted Taxonomic distance squared behaved anomalously and we do not recommend it.