DoOperator Research · May 13, 2026
Hook: The Dirty Secret of Statistics 101
Every introductory stats course hammers home the mantra: "Correlation is not causation." It’s repeated so often it becomes a reflex—a way to sound sophisticated when someone presents a surprising chart. But here’s the dirty secret: most courses never tell you what to do instead. They teach you to identify the sin, but not how to achieve redemption.
This post fixes that. We’ll walk through the five essential frameworks that turn correlation into causation—not as abstract philosophy, but as a practical toolkit you can use in your next analysis.
Causal inference is hard because of a single, maddening fact: we never observe the counterfactual.
If a patient takes a drug and recovers, we don’t know what would have happened if they hadn’t taken it. That unobserved reality—the counterfactual—is the gold standard for causation. Without it, we’re left comparing treated and untreated groups, hoping they’re exchangeable. Spoiler: they rarely are.
The entire field of causal inference is built on clever ways to approximate that missing counterfactual. Here’s how.
When it applies: You have a well-defined treatment and control group, and you care about individual-level or average treatment effects.
The Rubin Causal Model formalizes the counterfactual problem. For each unit i, there are two potential outcomes: if treated, if not. The causal effect for unit i is . Problem: you only observe one.
Key assumptions:
From here, we define:
When to use: This is the default framework for most applied work—especially in economics, medicine, and policy evaluation.
When it applies: You have a causal diagram (or can sketch one), and you need to decide which variables to condition on—or avoid.
Judea Pearl’s Directed Acyclic Graphs (DAGs) give you a visual language for causal assumptions. Each arrow represents a direct causal effect. The key insight: you can read off identification strategies from the graph.
The Backdoor Criterion: A set of variables Z satisfies the backdoor criterion if:
If you condition on Z, you can estimate the causal effect without bias.
Do-calculus extends this to more complex scenarios (e.g., mediation, selection bias). It’s a set of three rules that let you transform expressions involving interventions () into observable quantities.
When to use: Whenever you have a plausible causal diagram—even a rough one. It forces you to make assumptions explicit.
When they apply: You need a credible source of exogenous variation.
Identification is about how you isolate the causal effect. Three classic strategies:
When to use: Randomization is ideal but often impossible. Natural and quasi-experiments are your next best bets—but they require careful justification.
When it applies: You believe you’ve measured enough confounders to satisfy ignorability, and you need to estimate the treatment effect.
Once you’ve identified a set of covariates to condition on, you need to actually estimate the effect. Three workhorses:
When to use: When you have rich covariate data and a credible unconfoundedness assumption. Doubly-robust methods are increasingly preferred for their flexibility.
When it applies: After you’ve estimated a causal effect, before you publish or act on it.
No causal estimate is assumption-free. Sensitivity analysis quantifies how robust your result is to violations of those assumptions.
When to use: Always. If you can’t stress-test, you can’t trust.
After any of these → sensitivity analysis (Step 5).
“Correlation is not causation” is a warning, not a conclusion. The tools to move beyond it exist—and they’re more accessible than ever. Start with a DAG, choose an identification strategy, estimate carefully, and stress-test relentlessly.
Causation is hard. But it’s not impossible.