Nonexperimental Versus Experimental Estimates of Earnings Impacts

Abstract
To assess nonexperimental (NX) evaluation methods in the context of welfare, job training, and employment services programs, the authors reexamined the results of twelve case studies intended to replicate impact estimates from an experimental evaluation by using NX methods. They found that the NX methods sometimes came close to replicating experimentally derived results but often produced estimates that differed by policy-relevant margins, which the authors interpret as estimates of bias. Although the authors identified several study design factors associated with smaller discrepancies, no combination of factors would consistently eliminate discrepancies. Even with a large number of impact estimates, the positive and negative bias estimates did not always cancel each other out. Thus, it was difficult to identify an aggregation strategy that consistently removed bias while answering a focused question about earnings impacts of a program. They conclude that although the empirical evidence from this literature can be used in the context of training and welfare programs to improve NX research designs, it cannot on its own justify the use of such designs.