Towards a scientific workflow methodology for primary care database studies

Abstract
We describe the challenges of conducting studies based on mining large-scale primary care databases, namely data integration, data set definition, result reproducibility and reusability. These correspond to higher-level informatics challenges of automation, provenance capture and component integration. We provide a high-level view of the informatics infrastructure that addresses these challenges through a generic workflowbased e-Science middleware, and describe our experiences using the system to investigate differences in the health status of patients with diabetes before and after the national introduction of the UK GP contract in 2004.