← Blog
synthesisCausal Inference

Causal Inference: From Correlation to Causation in Five Steps

DoOperator Research · May 13, 2026

Causal Inference: From Correlation to Causation in Five Steps

Hook: The Dirty Secret of Statistics 101

Every introductory stats course hammers home the mantra: "Correlation is not causation." It’s repeated so often it becomes a reflex—a way to sound sophisticated when someone presents a surprising chart. But here’s the dirty secret: most courses never tell you what to do instead. They teach you to identify the sin, but not how to achieve redemption.

This post fixes that. We’ll walk through the five essential frameworks that turn correlation into causation—not as abstract philosophy, but as a practical toolkit you can use in your next analysis.


The Fundamental Problem: The Counterfactual You’ll Never See

Causal inference is hard because of a single, maddening fact: we never observe the counterfactual.

If a patient takes a drug and recovers, we don’t know what would have happened if they hadn’t taken it. That unobserved reality—the counterfactual—is the gold standard for causation. Without it, we’re left comparing treated and untreated groups, hoping they’re exchangeable. Spoiler: they rarely are.

The entire field of causal inference is built on clever ways to approximate that missing counterfactual. Here’s how.


1. The Potential Outcomes Framework (Rubin)

When it applies: You have a well-defined treatment and control group, and you care about individual-level or average treatment effects.

The Rubin Causal Model formalizes the counterfactual problem. For each unit i, there are two potential outcomes: Yi(1)Y_i(1) if treated, Yi(0)Y_i(0) if not. The causal effect for unit i is Yi(1)Yi(0)Y_i(1) - Y_i(0). Problem: you only observe one.

Key assumptions:

  • SUTVA (Stable Unit Treatment Value Assumption): No interference between units (your treatment doesn’t affect my outcome) and no hidden versions of treatment.
  • Ignorability (unconfoundedness): Treatment assignment is independent of potential outcomes, given covariates.

From here, we define:

  • ATE (Average Treatment Effect): E[Y(1)Y(0)]E[Y(1) - Y(0)]
  • ATT (Average Treatment Effect on the Treated): E[Y(1)Y(0)T=1]E[Y(1) - Y(0) | T=1]

When to use: This is the default framework for most applied work—especially in economics, medicine, and policy evaluation.


2. The Graphical Framework (Pearl)

When it applies: You have a causal diagram (or can sketch one), and you need to decide which variables to condition on—or avoid.

Judea Pearl’s Directed Acyclic Graphs (DAGs) give you a visual language for causal assumptions. Each arrow represents a direct causal effect. The key insight: you can read off identification strategies from the graph.

The Backdoor Criterion: A set of variables Z satisfies the backdoor criterion if:

  1. No node in Z is a descendant of the treatment.
  2. Z blocks every path between treatment and outcome that has an arrow pointing into treatment.

If you condition on Z, you can estimate the causal effect without bias.

Do-calculus extends this to more complex scenarios (e.g., mediation, selection bias). It’s a set of three rules that let you transform expressions involving interventions (do(X)do(X)) into observable quantities.

When to use: Whenever you have a plausible causal diagram—even a rough one. It forces you to make assumptions explicit.


3. Identification Strategies: Where the Magic Happens

When they apply: You need a credible source of exogenous variation.

Identification is about how you isolate the causal effect. Three classic strategies:

  • Randomization: The gold standard. Random assignment breaks the link between treatment and confounders. If you can randomize, you can estimate ATE directly.
  • Natural experiments: When nature or policy randomizes for you. Examples: lottery wins, draft lotteries, weather shocks, or cutoff rules in regression discontinuity designs.
  • Quasi-experiments: Methods like difference-in-differences, instrumental variables, and regression discontinuity. Each relies on different assumptions (parallel trends, exclusion restriction, continuity).

When to use: Randomization is ideal but often impossible. Natural and quasi-experiments are your next best bets—but they require careful justification.


4. Estimation Under Unconfoundedness

When it applies: You believe you’ve measured enough confounders to satisfy ignorability, and you need to estimate the treatment effect.

Once you’ve identified a set of covariates to condition on, you need to actually estimate the effect. Three workhorses:

  • Propensity Score Matching (PSM): Estimate the probability of treatment given covariates, then match treated and untreated units with similar scores. Reduces dimensionality, but sensitive to misspecification.
  • Inverse Probability Weighting (IPW): Weight each unit by 1/propensity score1/\text{propensity score} for treated, 1/(1propensity score)1/(1-\text{propensity score}) for control. Creates a pseudo-population where treatment is independent of covariates.
  • Doubly-Robust Estimation: Combines outcome regression and propensity score modeling. It’s consistent if either model is correct—giving you two chances to be right.

When to use: When you have rich covariate data and a credible unconfoundedness assumption. Doubly-robust methods are increasingly preferred for their flexibility.


5. Sensitivity Analysis: Stress-Testing Your Causal Claim

When it applies: After you’ve estimated a causal effect, before you publish or act on it.

No causal estimate is assumption-free. Sensitivity analysis quantifies how robust your result is to violations of those assumptions.

  • E-value: How strong would an unmeasured confounder need to be (in terms of risk ratio) to explain away your observed effect? An E-value of 1.5 means a confounder would need to increase the risk of both treatment and outcome by 1.5-fold—a useful benchmark.
  • Rosenbaum Bounds: For matched studies, how much hidden bias would be needed to change your conclusion? If your result is sensitive to small bias, treat it with caution.

When to use: Always. If you can’t stress-test, you can’t trust.


Decision Guide: Three Questions to Choose Your Framework

  1. Can you randomize? → Use randomization (Step 3), estimate via simple difference in means.
  2. Do you have a credible natural experiment? → Use IV, DiD, or RDD (Step 3).
  3. If not, can you measure enough confounders? → Use DAGs to identify adjustment set (Step 2), then estimate via PSM/IPW/doubly-robust (Step 4).

After any of these → sensitivity analysis (Step 5).


Common Mistakes (And How to Avoid Them)

  • Conditioning on a collider: A collider is a variable caused by both treatment and outcome. Conditioning on it opens a backdoor path. Fix: Use DAGs to spot colliders and avoid conditioning on them.
  • Ignoring selection bias: If your sample is selected based on the outcome or treatment, your estimates are biased. Fix: Use Heckman-type corrections or bounds.
  • Confusing mediation with confounding: A mediator is on the causal path; a confounder is a common cause. Adjusting for a mediator can block the very effect you’re trying to estimate. Fix: Only adjust for confounders, not mediators.

The Bottom Line

“Correlation is not causation” is a warning, not a conclusion. The tools to move beyond it exist—and they’re more accessible than ever. Start with a DAG, choose an identification strategy, estimate carefully, and stress-test relentlessly.

Causation is hard. But it’s not impossible.

More from the blog

Correlation Was Never the Problem"Correlation is not causation" is one of the most-repeated phrases in empirical research. It is also, as usually understood, a dramatic understatement of the actual difficulty. The real challenge is not distinguishing correlation from causation — it is identifying which causal story is correct when several are consistent with the same data.May 29, 2026The Illusion of Control: Why Most A/B Tests Mislead More Than They InformOrganizations run thousands of A/B tests every year and congratulate themselves on being data-driven. Most of those tests are statistically invalid. Here is why — and what rigorous experimentation actually requires.May 27, 2026What N-of-1 Trials Get Right That Population Studies Get WrongRandomized trials on populations measure average effects in heterogeneous groups. N-of-1 trials measure what actually happens to one specific person. For individual decision-making, the latter is usually more relevant.May 26, 2026
Causal Inference: From Correlation to Causation in Five Steps — DoOperator Research | DoOperator