Aggregation of existing geographic regions to diminish spurious variability of disease rates

Abstract
The availability of large data sets together with the growth in power and storage capabilities of computers have made the analysis of the spatial distribution of disease rates an increasingly important tool in public health research. Use of existing geographic divisions or groupings tends to result either in unstable estimates of disease rates if the corresponding populations are small or in loss of spatial resolution if the areas are unnecessarily large. This paper describes a computer algorithm for combining existing geographic areas into regions with populations large enough to diminish spurious variability in disease rates while limiting the loss in resolution. The method is demonstrated using Medicare hospital admissions data for pneumonia and central nervous system cancer. Disease rates were calculated for both predefined regions and those generated by the algorithm and their frequency distributions were compared. The algorithm produces more stable rates over a variety of diseases and provides substantially more flexibility than the use of predefined aggregations.