Over at Development Impact, David McKenzie summarizes a new paper that looks at the external validity of field experiments by comparing the results of the same experiment run in different places. The logic is simple: use the results of one experiment to predict what should happen in another location, based on observed characteristics. These extrapolations perform poorly, meaning that there are unobserved factors that determine how effective the experimental treatment is and that those factors vary meaningfully across locations.
The most important part of the post is what McKenzie refers to as “partner selection bias”, meaning that it appears that the partners (or experiment sites, or whatever) chosen for impact evaluations are selected for their likelihood of performing “well” – that is, showing a positive treatment effect. This would clearly be a Bad Thing because it would mean that our results are biased upwards – we see effects where there really are none. Strikingly, McKenzie avoids mentioning the Busia District of Kenya in his discussion of this problem. Busia has been the site of a large number of famous (and very high-quality) development RCTs, including Miguel and Kremer’s seminal de-worming study. There’s a decent undercurrent of worry that results from Busia may not extrapolate well, since experiments run elsewhere less “successful” (measured by finding a significant effect). This issue is highlighted by Kremer et al. (2009), who tried to run a school-level experiment on incentives for better test scores in both Busia and neighboring Teso. Compliance was so poor, and so negatively correlated with baseline performance, in Teso that they essentially had to drop it from their analysis.
It’s possible that Busia just didn’t cross McKenzie’s mind, but an alternative hypothesis is that he’s treading carefully, possibly because he’s wiser about these things than I am. But it’s a point that needs to be made: even though experimental evaluations still outperform observational analyses, we should be cautious in accepting results from certain places at face value. Busia is just one example of a place where development RCTs are probably over-concentrated and possibly biased. Malawi, where my own research centers, is another likely candidate. Before going full-bore with a policy, some kind of nationwide, staggered, randomized rollout seems advisable (think PROGRESA writ large). Otherwise we run the risk of pushing for programs that aren’t cost-effective, or, in the worst case, even harmful.