Evaluations of emergency medical service (EMS) programs have been ambiguous, due in part, to problems of sample definition. Four different sampling strategies were studied: 1) all patients in cardiac arrest; 2) patients with a final diagnosis of myocardial infarction (MI); 3) patients with an emergency room diagnosis of "rule out MI"; and 4) patients identified by the ambulance team as a possible MI. Using a regional data base of all ambulance runs, we created study samples based on each of these strategies and measured the error that may be introduced as a result of sample selection. Bias was measured along three parameters of EMS system performance: 1) observed incidence of MI in the ambulance system; 2) condition recognition--the ability of the ambulance team to correctly identify acute cardiac patients; and 3) emergency room and hospital mortality rates. The emergency room diagnosis strategy systematically excludes all false-positives, while samples based on the ambulance team's assessment omit all false-negatives. The final diagnosis strategy yields significant underestimates of cardiac mortality. Samples restricted to cardiac arrests result in biased estimates of both the incidence of MI and the number of deaths.