Oblique decision trees for spatial pattern detection: optimal algorithm and application to malaria risk
Open Access
- 18 July 2005
- journal article
- research article
- Published by Springer Nature in BMC Medical Research Methodology
- Vol. 5 (1) , 22
- https://doi.org/10.1186/1471-2288-5-22
Abstract
Background: In order to detect potential disease clusters where a putative source cannot be specified, classical procedures scan the geographical area with circular windows through a specified grid imposed to the map. However, the choice of the windows' shapes, sizes and centers is critical and different choices may not provide exactly the same results.The aim of our work was to use an Oblique Decision Tree model (ODT) which provides potential clusters without pre-specifying shapes, sizes or centers. For this purpose, we have developed an ODT-algorithm to find an oblique partition of the space defined by the geographic coordinates.Methods: ODT is based on the classification and regression tree (CART). As CART finds out rectangular partitions of the covariate space, ODT provides oblique partitions maximizing the interclass variance of the independent variable. Since it is a NP-Hard problem in RN, classical ODT-algorithms use evolutionary procedures or heuristics. We have developed an optimal ODT-algorithm in R2, based on the directions defined by each couple of point locations. This partition provided potential clusters which can be tested with Monte-Carlo inference.We applied the ODT-model to a dataset in order to identify potential high risk clusters of malaria in a village in Western Africa during the dry season. The ODT results were compared with those of the Kulldorff' s SaTScan™.Results: The ODT procedure provided four classes of risk of infection. In the first high risk class 60%, 95% confidence interval (CI95%) [52.22–67.55], of the children was infected. Monte-Carlo inference showed that the spatial pattern issued from the ODT-model was significant (p < 0.0001).Satscan results yielded one significant cluster where the risk of disease was high with an infectious rate of 54.21%, CI95% [47.51–60.75]. Obviously, his center was located within the first high risk ODT class. Both procedures provided similar results identifying a high risk cluster in the western part of the village where a mosquito breeding point was located.Conclusion: ODT-models improve the classical scanning procedures by detecting potential disease clusters independently of any specification of the shapes, sizes or centers of the clusters.Keywords
This publication has 42 references indexed in Scilit:
- GLOBAL VOICES OF SCIENCE: It Takes a Village: Medical Research and Ethics in MaliScience, 2005
- Inducing oblique decision trees with evolutionary algorithmsIEEE Transactions on Evolutionary Computation, 2003
- Late detection of breast and colorectal cancer in Minnesota counties: an application of spatial smoothing and clusteringStatistics in Medicine, 2002
- Regression Modelling of Disease Risk in Relation to Point SourcesJournal of the Royal Statistical Society Series A: Statistics in Society, 1997
- A spatial scan statisticCommunications in Statistics - Theory and Methods, 1997
- Some Methods for Investigating Spatial Clustering, with Epidemiological ApplicationsJournal of the Royal Statistical Society Series A: Statistics in Society, 1997
- A class of tests for detecting ‘general’ and ‘focused’ clustering of rare diseasesStatistics in Medicine, 1995
- The choice of test for detecting raised disease risk near a point sourceStatistics in Medicine, 1995
- Comparison of the cox model and the regression tree procedure in analysing a randomized clinical trialStatistics in Medicine, 1993
- Trees and trackingStatistics in Medicine, 1993