My colleague Adam rightfully approves of the spread of RCTs as a means of evaluating interventions, but raises a couple of important concerns about drawing inferences with them. His overall point about the generalizability of results is on point: especially in economics, many people are dubious that we can learn anything broadly useful from experiments, even when the internal inferences of those experiments are valid. However, , Adam’s specific example of a source of these issues (artificially constraining the study population for randomized controlled trials) is troubling because it’s completely unnecessary. If you’re running a randomized experiment you don’t need ex ante restrictions on your sample. They’re not necessary for causal inference, and as Adam points out they limit the extent to which you can generalize your results. In fact, not only is such a restriction unnecessary, eliminating the need for restrictions is the entire point of experiments!
Think about doing a non-experimental evaluation of a blood pressure medication. That means you look at the people who took the drug and the ones who didn’t and see who had better outcomes. The first thing you might worry about is that people with initially higher blood pressure take the drug, so even if it works your comparison might appear to show no effect – or even a negative one. The general notion here is that your treatment and control group are unbalanced on observed factors. It’s possible to think of lots of other factors that could impact the trend in blood pressure, like age and diet. If you don’t run an experiment, you need to control for those factors in your analysis. We often do that through a linear regression, but another approach would be to use matching: for each drug user, find a non-user who looks the same as a basis for comparison. The problem with doing this is that there are lots of factors that we can’t see because they aren’t ever recorded in the data (genetics, environmental factors, behaviors, etc.). If those factors are correlated with the treatment, then our estimates will be biased
To get around this problem, we just randomize! Assigning the treatment randomly ensures that with a big enough sample, the treatment and control groups will be balanced on all variables – unobserved ones but also observed ones. That means you never have to worry about heterogeneity in your experiment population – leave everyone in and let the random number generator sort them out. In fact, more heterogeneity is obviously better: in many cases the impact of the treatment will vary based on observed factors like race, sex or age, and if you include a diverse group you can look at those variable effects directly.
That’s not to say that RCTs never have constrained populations. A lot of times study populations are restricted because it’s unavoidable in the experimental setup. For example, it’s hard to include Kenyans if I’m running a study in Malawi. Sometimes, on the other hand, a project will target a specific population because that’s the intervention they want to study – think about experiments on schoolchildren, which test the effectiveness of rolling out a policy to all schools. Ideally you want to run your experiment on a representative sample of the whole population that’s relevant for the intervention/policy/medication/etc. that you’re testing. How do we ensure that? You got it – we randomize. Specifically we choose a random sample of the entire population. In large enough samples that ensures that our study population looks like the overall population.
I don’t think any economists are imposing artificial constraints on their experiment populations. If they are, then whomever taught their program evaluation course should be tarred and feathered. But the fact that Adam is worried about it implies that people in medicine may be committing this statistical sin. That’s actually pretty understandable: in general, the more a discipline is able to run actual experiments the less its practitioners need to know about statistics. Usually that limited knowledge makes little difference in a field like medicine, but this is a pretty important exception. Spread the word.