"Correlation is not causation" is one of the most-repeated phrases in empirical research. It is also, as usually understood, a dramatic understatement of the actual difficulty. The real challenge is not distinguishing correlation from causation — it is identifying which causal story is correct when several are consistent with the same data.
DoOperator Research · May 29, 2026
Decision takeaway
"Correlation is not causation" is one of the most-repeated phrases in empirical research. It is also, as usually understood, a dramatic understatement of the actual difficulty. The real challenge is not distinguishing correlation from causation — it is identifying which causal story is correct when several are consistent with the same data.
"Correlation is not causation" is one of the most-repeated phrases in empirical research. Students learn it early. Journalists invoke it to dismiss inconvenient findings. Researchers add it as a disclaimer before interpreting their results.
It is also, as usually understood, a dramatic understatement of the actual difficulty.
The real challenge of causal inference is not distinguishing correlation from causation — it is identifying which causal story is correct when several are consistent with the same observed data. Correlation is just the beginning of the problem.
The standard framing presents causation as a binary upgrade from correlation: you have correlation, and if you're careful and lucky you can confirm that it's also causation. The implication is that causation is correlation plus some additional evidence — a temporal sequence, perhaps, or a plausible mechanism.
This framing is wrong in a way that matters.
The fundamental problem is that any set of observed associations is consistent with an infinite number of causal structures. Two variables that are correlated could be related because A causes B, because B causes A, because a third variable C causes both, because the sample was selected in a way that induces a spurious association, or because of any combination of these. Knowing the magnitude of the correlation tells you essentially nothing about which of these is true.
This is not a problem of insufficient data. More data makes the correlations more precise. It does not help you choose between competing causal models, all of which predict the same pattern of associations.
Formal causal inference has a name for this: the identification problem. A causal quantity is identified if it can, in principle, be recovered from the joint distribution of observed variables. Many quantities we care about are not identified — they are simply not determinable from observational data alone, regardless of sample size.
Consider a simple case. You observe that people who take a certain medication have better health outcomes than those who do not. The causal effect of the medication is not identified from this observation, because the decision to take the medication is not random — sicker people may be more or less likely to take it, and this selection into treatment is correlated with the outcome. No amount of statistical adjustment recovers the causal effect unless you can account for all the factors that influenced treatment selection.
The word "all" is doing enormous work in that sentence. In any real application, you cannot be certain you have accounted for everything. The residual uncertainty is not quantified by your standard errors. It is a different kind of uncertainty — epistemic, structural — that frequentist inference does not address.
The reason randomized controlled trials are the gold standard for causal inference is not that they eliminate confounding — it is that they create a world in which confounding cannot exist by construction. When treatment assignment is random, it is, by definition, independent of every other characteristic of the unit being assigned. The causal effect is identified because you have intervened on the assignment mechanism, not merely observed it.
This is the insight behind Judea Pearl's do-calculus and the potential outcomes framework associated with Donald Rubin. The key object is not the conditional probability P(Y | X = x) — the probability of outcome Y given that you observe X equal to x — but rather P(Y | do(X = x)) — the probability of outcome Y when you intervene to set X to x. These are different quantities, and observational data alone cannot bridge the gap between them except under strong assumptions.
The randomized experiment collapses this distinction. When you randomize, observing X = x and intervening to set X = x produce the same distribution of outcomes. That is why the experiment is informative about causation in a way that observation is not.
Even a well-designed randomized trial faces a challenge that the correlation/causation dichotomy obscures: the identified causal effect may not generalize.
A trial establishes the average treatment effect in its study population. This is a genuine causal quantity. But it may not be the causal quantity you actually care about, which is the effect in some target population — future patients, different markets, other contexts. The gap between the study population and the target population is a causal problem, not a statistical one. It is not addressed by larger samples or better randomization within the trial.
Heterogeneity compounds this. The average treatment effect conceals variation across individuals, sites, and contexts. An intervention that produces a +3% average effect may produce +15% for some subgroups and -5% for others. These subgroup effects are generally not identified from the trial alone — they require either stratified randomization or separate experiments — and the pressure to report a single number obscures the distribution of effects that actually matters for decision-making.
The correlation/causation slogan, in its usual invocation, suggests that the solution to confounding is methodological discipline: control for the right variables, use instrumental variables, run a regression discontinuity. These methods are valuable. They do not solve the identification problem — they replace it with different assumptions, whose plausibility must be argued, not tested.
The honest position is that causal inference from observational data requires a causal model — an explicit set of assumptions about the structure of the data-generating process. The assumptions can be more or less plausible. They can be subjected to sensitivity analysis. But they cannot be eliminated. Every observational causal claim is conditional on a causal model, and the model is not in the data.
This is not a counsel of despair. It is a counsel of transparency. The goal is not to pretend that observational evidence is as clean as experimental evidence — it never is — but to be explicit about what assumptions are required, what violations of those assumptions would do to the conclusions, and what further evidence would change the picture.
Correlation was never the problem. The problem is that causal inference is hard, the assumptions required are rarely stated, and the uncertainty that remains after careful analysis is rarely quantified. The slogan papers over this difficulty with a simple rule that, once satisfied, appears to license confident causal claims.
It does not. The work of causal inference is in the details of the identification strategy, and no amount of methodological sophistication eliminates the need to state, defend, and test the assumptions on which conclusions rest.