The effect of heart rate variability biofeedback training on stress and anxiety: a meta-analysis. — DoOperator Research

Authors	Goessl VC, Curtiss JE, Hofmann SG
Journal	Psychol Med
Year	2017
DOI	10.1017/S0033291717001003
Citations	595

TL;DR

Heart rate variability (HRV) biofeedback training produces a large reduction in self-reported stress and anxiety (between-group effect size g = 0.83), equivalent to moving from the 50th to roughly the 80th percentile of improvement, making it a promising self-administered intervention you can test at home with a wearable device.

What they tested

This is a meta-analysis — a statistical synthesis of 24 separate studies — testing whether HRV biofeedback training reduces self-reported stress and anxiety symptoms.

Intervention: HRV biofeedback training. Participants wore a device that measured their heart rate variability in real time (usually via a chest strap or finger sensor) and received visual or auditory feedback to help them slow their breathing to a specific "resonance frequency" (typically around 6 breaths per minute, or 0.1 Hz). The goal was to maximise the amplitude of their heart rate oscillations — a state sometimes called "cardiac coherence" or "resonant breathing."

Comparators: 13 of the 24 studies included a control group. Control conditions included:

Waitlist (no treatment)
Standard care or treatment as usual
Sham biofeedback (a device that appeared active but gave no real feedback)
Progressive muscle relaxation
Daily thought record

Outcome measures: All studies used validated self-report questionnaires for stress and/or anxiety. Common instruments included:

State-Trait Anxiety Inventory (STAI-S for state anxiety, STAI-T for trait anxiety)
Beck Anxiety Inventory (BAI)
Depression Anxiety Stress Scales (DASS)
Hospital Anxiety and Depression Scale (HADS)
Self-report stress scales

Primary outcome: Change in self-reported stress or anxiety from pre- to post-treatment.

Secondary analyses: Moderator analyses tested whether effects differed by number of sessions, gender, clinical diagnosis, study year, or risk of bias.

Who was studied

Total participants: 484 individuals across 24 studies
Sample sizes per study: ranged from 5 to 106 participants (median ~13)
Population mix: 14 studies recruited from community (non-clinical) settings; 10 studies recruited from clinical settings
Clinical populations included: perinatal depression, COPD, cardiac surgery patients, borderline personality disorder, speech anxiety, performance anxiety, sympathetic over-arousal
Age range: mean ages from 19 to 66 years across studies
Gender: percentage female ranged from 0% to 100% across studies
Countries: USA (most studies), plus Korea, EU countries, India, Africa, Indonesia
Number of sessions: ranged from 1 to 50 sessions (median ~5 sessions)

How they measured it

The meta-analysis extracted data from studies that used the following validated self-report instruments:

State-Trait Anxiety Inventory (STAI): Two 20-item scales. STAI-S measures "right now" anxiety (score range 20–80, higher = more anxious). STAI-T measures habitual anxiety (same range). Used in 12 of 24 studies.
Beck Anxiety Inventory (BAI): 21 items measuring physical and cognitive anxiety symptoms over the past week (score range 0–63, higher = more anxious). Used in 2 studies.
Depression Anxiety Stress Scales (DASS): 42 items measuring depression, anxiety, and stress (each subscale 0–42, higher = worse). Used in 1 study.
Hospital Anxiety and Depression Scale (HADS): 14 items, anxiety subscale 0–21 (higher = more anxious). Used in 1 study.
Other instruments: DSP (stress symptoms), self-report scales for speech anxiety, and custom stress measures.

Important limitation: All outcomes were self-report. No study used objective physiological stress markers (e.g., cortisol, heart rate during stress challenge) as a primary outcome, though some measured HRV itself as a manipulation check.

Methodology

Study design: This is a meta-analysis — a quantitative synthesis of existing studies. The authors searched PubMed, PsycINFO, and the Cochrane Library using systematic search terms. From 2,297 initial results, they screened 1,801 unique records and included 24 studies meeting pre-specified criteria.

Inclusion criteria:

At least one treatment condition was HRV biofeedback
Used a psychometrically adequate measure of self-reported stress or anxiety
Sample aged 18+
Sufficient descriptive statistics to compute effect sizes

Exclusion criteria:

Reviews, meta-analyses, surveys, manuals, conference abstracts
Other biofeedback types (EMG, EEG)
HRV biofeedback combined with another active treatment (e.g., CBT, mindfulness, progressive muscle relaxation)

Statistical approach:

Random-effects meta-analysis (assumes true effect varies across studies)
Effect size: Hedges' g (corrects for small sample bias)
Pre-post correlation assumed r = 0.70 for within-group analyses
Both within-group (pre-post) and between-group (biofeedback vs. control) effect sizes calculated
Moderator analyses using between-group heterogeneity statistic (QB) and meta-regression
Publication bias assessed via funnel plot, fail-safe N, trim-and-fill method, and Egger's regression

Risk of bias assessment: Used Cochrane Handbook criteria across four domains:

Sequence generation (randomisation adequacy)
Allocation concealment (blinding of assignment)
Incomplete outcome data (handling of dropouts)
Selective outcome reporting (reporting all measured outcomes)

Studies were rated overall as low, unclear, or high risk of bias.

What this design can prove:

A meta-analysis increases statistical power by pooling data across studies
Can detect overall patterns that individual small studies might miss
Moderator analyses can identify factors that influence treatment effectiveness
Random-effects models account for between-study variability, making results more generalisable

What this design cannot prove:

Cannot establish causality — the quality of the meta-analysis depends entirely on the quality of the included studies
Cannot control for confounds that individual studies failed to address
Cannot determine optimal dosing (number of sessions, session length, breathing rate) with precision — only broad patterns
Cannot assess long-term effects beyond what individual studies measured (most were short-term)
Publication bias remains a concern despite statistical corrections

Major methodological weaknesses:

Small sample sizes: Individual studies ranged from 5 to 106 participants; many had fewer than 20. Small studies tend to overestimate effect sizes.
Lack of blinding: HRV biofeedback is difficult to blind — participants know they're receiving feedback. Only 2 studies used sham biofeedback as a control.
Self-report only: No objective stress measures (cortisol, heart rate reactivity, behavioural measures)
High risk of bias: Many studies rated "unclear" or "high" risk on multiple domains
Heterogeneity: Studies varied widely in populations, number of sessions, devices used, and outcome measures
Pre-post correlation assumption: The authors assumed r = 0.70 for within-group analyses, which may not be accurate for all studies
No active comparator: Most control conditions were waitlist or treatment-as-usual, not an active alternative treatment

Key findings

Primary outcome — Reduction in stress/anxiety symptoms:

Within-group effect size (pre-post): Hedges' g = 0.81 (95% CI not reported in abstract, but described as "large")
Between-group effect size (biofeedback vs. control): Hedges' g = 0.83 (95% CI not reported in abstract)
Both effect sizes are considered "large" by Cohen's conventions (0.8 = large)

Moderator analyses (all non-significant):

Study year: no effect (QB not significant)
Risk of study bias: no effect (studies with higher bias did not show larger effects)
Percentage of females: no effect
Number of sessions: no effect (range 1–50 sessions)
Presence of an anxiety disorder diagnosis: no effect

Publication bias:

Fail-safe N: exceeded the threshold of 5K + 10 = 130, suggesting results are robust to publication bias
Funnel plot inspection: described as "symmetrical" (no obvious publication bias)
Trim-and-fill method: imputed effect size remained significant
Egger's regression intercept: not significant (no evidence of small-study bias)

Secondary findings:

13 of 24 studies included a control group
Control conditions varied: waitlist (6 studies), standard care/TAU (3 studies), sham biofeedback (2 studies), progressive muscle relaxation (1 study), daily thought record (1 study)
Effects were consistent across clinical and non-clinical populations

Effect magnitude

In plain English:

A between-group effect size of g = 0.83 means that the average person receiving HRV biofeedback had lower stress/anxiety scores than approximately 80% of people in the control group. This is a large effect — roughly equivalent to the difference in anxiety between someone with no diagnosed condition and someone with mild-to-moderate generalised anxiety disorder receiving no treatment.

To put it in concrete terms: if the average stress/anxiety score in the control group was at the 50th percentile, the average score in the biofeedback group would be at about the 80th percentile of improvement. On a typical anxiety scale like the STAI (range 20–80), this might translate to a reduction of roughly 8–12 points — enough to move someone from "moderate anxiety" to "mild anxiety" or from "mild anxiety" to within normal range.

The within-group effect (g = 0.81) is nearly identical, suggesting that most of the benefit comes from the biofeedback itself rather than from placebo or natural recovery (though the lack of blinding in most studies makes this interpretation tentative).

Important caveat: These are pooled estimates from small, heterogeneous studies. The true effect for any individual may be smaller or larger depending on adherence, device quality, breathing technique, and baseline stress levels.

Limitations

What the authors acknowledge:

Need for more well-controlled studies
Small number of studies with active control conditions (only 2 used sham biofeedback)
Heterogeneity in study populations and protocols
Lack of long-term follow-up data

What a critical reader would note:

Small sample sizes: 484 total participants across 24 studies means average study had ~20 participants. Small studies inflate effect sizes and reduce reliability.
Self-report bias: All outcomes were self-reported. People who volunteer for a biofeedback study may be more motivated to report improvement. No objective physiological stress markers.
Lack of blinding: Only 2 of 24 studies used sham biofeedback. Participants knew they were getting an active treatment, creating strong expectancy effects. Biofeedback is inherently difficult to blind.
No active comparator: Most control groups were waitlist or treatment-as-usual. We don't know if HRV biofeedback is better than other active interventions (e.g., meditation, exercise, CBT).
Publication bias risk: Despite statistical corrections, the meta-analysis only includes published studies. Unpublished null results may exist.
Dose-response uncertainty: Number of sessions (1–50) did not moderate effects, which is puzzling. This could mean even a single session works, or that the measure of "dose" is too crude (session length, home practice, adherence not captured).
Device variability: Studies used different biofeedback devices, some clinical-grade, some consumer-grade. Quality of feedback likely varies.
Population limits: Mostly healthy adults or specific clinical populations. Unknown generalisability to children, elderly, or people with medical conditions affecting HRV (e.g., diabetes, heart disease).
Short duration: Most studies measured effects immediately post-treatment. No data on whether benefits persist weeks or months later.
Breathing rate not standardised: While resonance frequency breathing (~6 breaths/min) is standard, not all studies verified that participants achieved this rate.
No intention-to-treat analysis: Many studies likely analysed only completers, which overestimates effects if dropouts had worse outcomes.

Practical takeaways

For someone running their own n=1 experiment:

What to test

Intervention: Slow, resonant-frequency breathing guided by HRV biofeedback. Target breathing rate: approximately 6 breaths per minute (5-second inhale, 5-second exhale, or 4-second inhale, 6-second exhale). Use a device that provides real-time feedback on HRV amplitude or coherence.
Dose: Based on the meta-analysis, even 1 session showed effects, but most studies used 4–10 sessions. Start with 10–20 minutes daily for 2–4 weeks.
Devices to consider: HeartMath Inner Balance (ear sensor + app), Elite HRV (chest strap + app), Oura Ring (night-time HRV), or any Bluetooth heart rate monitor paired with a biofeedback app (e.g., SweetBeat, HRV Biofeedback).

Minimum meaningful duration

2 weeks minimum to see initial effects (most studies lasted 2–8 weeks)
4 weeks optimal to establish the habit and see reliable changes
Some studies showed effects after a single session, but these are likely short-lived

What to measure

Primary metric: Self-reported stress or anxiety using a validated scale. Best options for n=1:
- State-Trait Anxiety Inventory (STAI): 20 items, 5 minutes. Available online. Measure both state (right now) and trait (general) anxiety weekly.
- Perceived Stress Scale (PSS-10): 10 items, 3 minutes. Measures stress over the past month. Take weekly.
- Visual Analog Scale (VAS): "Rate your stress right now from 0–100." Quick daily measure.
Secondary metrics:
- HRV itself: Measure resting HRV (RMSSD or HF power) each morning before getting out of bed. Use a consistent protocol: 5 minutes supine, same time daily, no caffeine beforehand.
- Heart rate: Resting heart rate and heart rate during a standardised stressor (e.g., 3 minutes of mental arithmetic)
- Sleep quality: Subjective (sleep diary) or objective (wearable)
- Mood: Daily mood rating (1–10)
Frequency: Daily for VAS and HRV; weekly for full questionnaires

Key confounds to control for

Time of day: Measure and practice at the same time daily. HRV naturally varies with circadian rhythms.
Caffeine and alcohol: Both affect HRV. Avoid caffeine 4 hours before measurement; avoid alcohol 12 hours before.
Exercise: Intense exercise elevates HRV for hours. Don't measure within 2 hours of exercise.
Meals: Eating affects HRV. Measure before meals or at least 2 hours after.
Medications: Beta-blockers, antidepressants, antihistamines, and many others affect HRV. Note any changes.
Menstrual cycle: HRV varies across the cycle. Track cycle phase if applicable.
Sleep: Poor sleep reduces HRV. Record sleep quality each night.
Breathing technique: Ensure you're actually breathing at resonance frequency. Use a pacer or timer. Don't just "try to relax."
Expectancy effects: You know you're doing an intervention. Consider a "sham" control period (e.g., 2 weeks of normal breathing while wearing the device but not following feedback) before starting active training.

What a positive result would look like

STAI-S reduction of 8–12 points (e.g., from 45 to 35) after 4 weeks
PSS-10 reduction of 5–7 points (e.g., from 22 to 16)
Daily VAS stress rating decreasing from average 6/10 to 4/10
Resting HRV (RMSSD) increasing by 15–30% from baseline (e.g., from 35 ms to 45 ms)
Subjective feeling: Noticeably calmer during stressful situations, faster recovery after stress

Read full paper →More HRV →