Markov Decision Processes with Unobserved Confounders: A Causal Approach — DoOperator Research

Authors	Junzhe Zhang, Elias Bareinboim
Journal	CausalAI Lab Technical Report R-23
Year	2016

What Problem It Solves

Standard MDP estimation assumes observed state captures the information needed for valid transition and reward learning; unobserved confounders break that assumption.

What problem it solves

Standard MDP estimation assumes observed state captures the information needed for valid transition and reward learning; unobserved confounders break that assumption.

How it works

The paper formulates sequential decision problems in causal-model terms, clarifying which policy values are identifiable under hidden confounding.

When to use it

Use as the foundational citation whenever an RL method claims robustness to hidden confounding in sequential decisions.

Limitations and failure modes

The formulation highlights identifiability limits; it does not make unobserved confounding disappear without additional structure or data.

Read full paper →More Causal Inference →