| Authors | Cameron Hepburn, Brian O’Callaghan, Nicholas Stern, Joseph E. Stiglitz, Dimitri Zenghelis |
| Journal | Oxford Review of Economic Policy |
| Year | 2020 |
| DOI | 10.1093/oxrep/graa015 |
| Citations | 756 |
TL;DR
A survey of 231 economic experts from G20 countries identified five fiscal recovery policies—clean physical infrastructure, building efficiency retrofits, education/training investment, natural capital investment, and clean R&D—that simultaneously deliver high economic multipliers and strong climate benefits, offering a roadmap for governments designing post-COVID stimulus packages.
The researchers tested the perceived performance of 25 distinct fiscal recovery policy archetypes across four dimensions: speed of implementation (how quickly the policy can be deployed), economic multiplier (the total economic output generated per unit of government spending), climate impact potential (the net effect on greenhouse gas emissions, positive or negative), and overall desirability (a composite judgment of the policy's attractiveness given current conditions). The 25 archetypes ranged from traditional stimulus measures (e.g., road building, airline bailouts) to climate-focused interventions (e.g., renewable energy subsidies, reforestation, clean R&D tax credits). The study also compared results between high-income countries and lower- and middle-income countries (LMICs) to identify context-specific priorities.
The sample consisted of 231 central bank officials, finance ministry officials, and other economic experts from G20 countries. The authors do not provide a detailed demographic breakdown (age, gender, years of experience), but the respondents were selected for their professional expertise in fiscal policy and economic recovery. The survey was conducted in May–June 2020, during the early phase of the COVID-19 pandemic. The response rate is not reported, and the authors acknowledge that the sample is a convenience sample of experts willing to participate, not a random or representative sample of all G20 economic officials.
The study used a structured online survey. For each of the 25 policy archetypes, respondents rated the policy on four dimensions using a 5-point Likert scale (1 = very low, 5 = very high). The survey also included open-ended questions allowing respondents to add comments or suggest additional policies. The authors then calculated mean scores for each policy on each dimension, and ranked policies by their combined score on economic multiplier and climate impact potential. For the climate impact dimension, respondents were asked to consider the net effect on greenhouse gas emissions, accounting for both direct and indirect effects (e.g., a road-building project might increase emissions directly through construction and indirectly through induced driving). The authors also conducted a separate analysis for LMICs by asking respondents to rate policies specifically in the context of lower- and middle-income countries.
Study design: This is a cross-sectional expert elicitation survey. It is not an experiment, a randomized controlled trial, or a meta-analysis. The researchers did not manipulate any variables or assign participants to conditions. Instead, they collected subjective judgments from a panel of experts at a single point in time.
Why this design matters: Expert elicitation is a well-established method in economics and policy analysis when direct empirical data is unavailable or when the question involves future-oriented, counterfactual scenarios (e.g., "What would be the economic multiplier of a hypothetical green infrastructure program?"). The COVID-19 crisis was unfolding in real-time, and no historical data existed on the effects of pandemic-era fiscal recovery packages on climate outcomes. The survey allowed the researchers to aggregate the collective judgment of experts who had relevant domain knowledge.
What this design can and cannot prove: This design can reveal the perceived performance of different policies among a group of experts. It can identify policies that are broadly viewed as having high potential on multiple dimensions. It can also highlight areas of consensus and disagreement. However, this design cannot prove that any particular policy actually produces a given economic multiplier or climate impact. The results are opinions, not empirical measurements. The study does not test any causal hypothesis—it does not show that clean infrastructure causes higher economic growth or lower emissions. It shows that experts believe it would. The design also cannot account for implementation failures, political constraints, or unintended consequences that might differ from expert expectations.
Duration: The survey was administered at a single time point (May–June 2020). There is no follow-up or longitudinal component. The results reflect expert views during a specific moment of the pandemic, when uncertainty was high and many governments were still designing their initial recovery packages.
Statistical approach: The authors report mean scores and rankings. They do not report standard deviations, confidence intervals, or formal statistical tests (e.g., t-tests, ANOVA) comparing policies. This is a descriptive analysis, not an inferential one. The lack of inferential statistics means we cannot assess whether the differences between policies are statistically significant or could be due to random variation in expert opinions.
Major methodological weaknesses:
Top five policies for combined economic multiplier and climate impact (high-income countries):
Policies with high economic multiplier but low climate impact (or negative impact):
Policies with low economic multiplier but high climate impact:
Differences for lower- and middle-income countries (LMICs):
Overall desirability ranking (combining all four dimensions):
Policies ranked lowest on overall desirability:
Short-run impacts of COVID-19 on emissions: The authors note that global CO2 emissions fell by an estimated 17% at the peak of lockdowns in April 2020, but this was temporary and emissions rebounded quickly as restrictions eased. They estimate that the lockdown-driven emissions reduction was equivalent to roughly 2–3 years of the annual emissions reductions needed to meet the Paris Agreement targets—but warn that without structural changes, emissions will return to pre-pandemic levels or higher.
Medium-run behavioral shifts: The authors discuss plausible but uncertain shifts in human behavior post-pandemic, including increased remote work (reducing commuting emissions), reduced air travel (especially business travel), and increased demand for local food and outdoor recreation. They caution that these shifts are not guaranteed and depend on policy choices (e.g., whether governments invest in broadband infrastructure to support remote work).
The study does not report effect sizes in the traditional sense (e.g., Cohen's d, risk ratios). Instead, the key "effect" is the ranking of policies by expert consensus. The magnitude of the difference between the top-ranked policy (clean physical infrastructure) and the bottom-ranked policy (fossil fuel subsidies) is not quantified in standard deviation units or percentage points. However, the authors provide qualitative context: clean physical infrastructure was rated "substantially higher" than fossil fuel subsidies on both economic multiplier and climate impact, with the gap being large enough that the authors recommend governments prioritize the former and avoid the latter. For the emissions reduction from lockdowns, the magnitude is clear: a 17% drop in daily global CO2 emissions at peak, but this was temporary and not sustained. To put this in perspective, the annual emissions reduction needed to meet the Paris Agreement targets is roughly 7.6% per year from 2020 to 2030. The lockdown-driven reduction was about twice that annual target, but it lasted only a few weeks.
Acknowledged by authors:
Additional limitations a critical reader would note:
For someone running their own n=1 experiment (e.g., a policymaker, activist, or investor testing the effectiveness of different advocacy strategies for green recovery):
What to test:
Minimum meaningful duration:
What to measure (specific metrics):
Key confounds to control for:
What a positive result would look like:
Related papers
Wheat From Chaff: Meta-Analysis As Quantitative Literature Review
T. D. Stanley · 2001
RCTFemale Empowerment: Impact of a Commitment Savings Product in the Philippines
Dean Karlan, Nava Ashraf, Wesley Yin +3 more · 2006
PaperGlobal, regional, and national burden of disorders affecting the nervous system, 1990–2021: a systematic analysis for the Global Burden of Disease Study 2021
Jaimie D Steinmetz, Katrin Seeher, Nicoline Schiess +97 more · 2024
PaperAutomation and New Tasks: How Technology Displaces and Reinstates Labor
Daron Acemoğlu, Pascual Restrepo · 2019