SPIRIT 2013 explanation and elaboration: guidance for protocols of clinical trials — DoOperator Research

Authors	A.-W. Chan, Jennifer Tetzlaff, Peter C Gøtzsche, Douglas G. Altman, H Mann, Jesse A. Berlin, Kay Dickersin, A. Hrobjartsson, Kenneth F. Schulz, Wendy R. Parulekar, Karmela Krleža-Jerić, A. Laupacis, David Moher
Journal	BMJ
Year	2013
DOI	10.1136/bmj.e7586
Citations	7,001

TL;DR

This paper provides a 33-item checklist and detailed guidance for writing complete, transparent clinical trial protocols, which directly applies to anyone designing a rigorous self-experiment: a well-structured protocol prevents bias, ensures reproducibility, and forces you to think through confounds before you start collecting data.

What they tested

This is not an experimental study testing an intervention. It is a methodological guidance document — a consensus-based checklist developed by 115 international stakeholders (trial investigators, methodologists, statisticians, ethicists, journal editors, funders, regulators) to specify the minimum content that should appear in any clinical trial protocol. The "intervention" is the SPIRIT 2013 checklist itself. The "outcome" is whether a protocol adequately addresses 33 key items across administrative information, introduction, methods, ethics, dissemination, and appendices.

The checklist covers:

Administrative information (title, registration, funding, roles)
Introduction (background, rationale, objectives, trial design)
Methods (setting, eligibility, interventions, outcomes, sample size, randomisation, blinding, data collection, statistical analysis)
Ethics and dissemination (consent, confidentiality, harms, auditing, publication policy)
Appendices (model consent forms, investigator CVs, data collection sheets)

Who was studied

No human participants were studied. The "subjects" were:

115 contributors from 11 countries who participated in the Delphi consensus survey and face-to-face meetings
Existing trial protocols (publicly available, from journals, investigators, and industry sponsors) used to source model examples for each checklist item
Empirical evidence from systematic literature searches (MEDLINE, EMBASE, Cochrane Methodology Register up to September 2009) to support or refute the importance of each checklist concept

The paper does not report a sample size for the number of protocols reviewed — it states that "model examples were selected to reflect how key elements could be appropriately described."

How they measured it

No measurement instruments were used on human subjects. The development process used three complementary methods:

Delphi consensus survey — a structured, multi-round survey where experts anonymously rated the importance of potential checklist items, with results fed back between rounds to build convergence
Two systematic reviews — one to identify existing protocol guidelines, another to find empirical evidence supporting specific checklist items
Two face-to-face consensus meetings — to finalise the checklist content

The checklist was then pilot-tested by graduate course students. The "measurement" was whether each item was deemed essential by the consensus process.

Methodology

Study design

This is a consensus development study combined with a systematic review and expert panel — not a randomised trial. The SPIRIT group used a predefined, transparent process modelled on established guidelines for developing reporting guidelines (the EQUATOR Network approach).

How they built the checklist

Delphi process (2007–2009): 115 stakeholders completed multiple rounds of surveys rating potential protocol items. This method reduces the influence of dominant personalities and allows anonymous input from diverse perspectives (clinicians, statisticians, ethicists, industry, regulators, patient representatives).
Systematic reviews: Two separate literature searches identified (a) existing protocol guidelines and (b) empirical evidence showing that specific protocol elements are often missing or poorly described. For example, they found that protocols frequently omit details on randomisation methods, blinding procedures, and sample size calculations.
Consensus meetings: Two in-person meetings with a subset of contributors finalised the 33-item checklist. Items were included only if they had strong empirical, pragmatic, or ethical rationale.
Pilot testing: Graduate students tested the checklist for clarity and feasibility.

What this design can and cannot prove

What it can prove:

That a diverse group of experts agreed on a minimum set of protocol items
That empirical evidence exists supporting the importance of many items
That model examples exist demonstrating feasibility of including each item

What it cannot prove:

That using the checklist actually improves trial quality or reduces bias — that would require a randomised trial comparing protocols written with vs. without SPIRIT guidance
That the checklist is complete or optimal — it represents consensus, not empirical validation
That following the checklist guarantees a good trial — it ensures transparent documentation, not good design

Major methodological strengths

Transparent, pre-specified development process
Broad stakeholder involvement (115 people from 11 countries)
Empirical evidence base for many items
Publicly available checklist and examples

Major methodological weaknesses

No formal testing of whether the checklist improves protocol quality or trial outcomes
The systematic review for empirical evidence stopped in 2009 (though the paper was published in 2013)
Some items (e.g., title, administrative details) were included based on "pragmatic or ethical rationale" rather than empirical evidence
The model examples were selected by the authors, not randomly sampled — potential selection bias

Key findings

The SPIRIT 2013 checklist contains 33 items organised into 5 sections. Here are the most critical items for someone designing a self-experiment:

Administrative Information (Items 1–5)

Item 1 (Title): Must describe study design, population, interventions, and acronym if applicable. Example: "A multi-center, investigator-blinded, randomized, 12-month, parallel-group, non-inferiority study..."
Item 2 (Registration): Trial identifier and registry name. For self-experiments, this means documenting your plan publicly before starting.
Item 4 (Funding): Sources of financial, material, and other support. For self-experiments: note any supplement company funding or conflicts.

Introduction (Items 6–8)

Item 6 (Background): Description of research question and justification, including summary of relevant studies examining benefits and harms. This forces you to do a literature review before starting.
Item 7 (Objectives): Specific objectives or hypotheses. Must be pre-specified.
Item 8 (Trial design): Type of trial (parallel group, crossover, factorial, single group), allocation ratio, and framework (superiority, equivalence, non-inferiority, exploratory).

Methods — Participants, Interventions, Outcomes (Items 9–12)

Item 10 (Eligibility criteria): Inclusion and exclusion criteria for participants. For self-experiments: define your own health status, baseline characteristics, and exclusion conditions.
Item 11a (Interventions): Sufficient detail to allow replication, including how and when administered. This is the most critical item — specify dose, timing, route, duration.
Item 11c (Adherence): Strategies to improve adherence and procedures for monitoring it (e.g., pill counts, logs).
Item 12 (Outcomes): Primary, secondary, and other outcomes, including specific measurement variables, analysis metrics, and timepoints. Must specify which outcome is primary.

Methods — Sample Size, Randomisation, Blinding (Items 13–17)

Item 13 (Sample size): How sample size was determined, including assumptions about effect size, variability, and statistical power. For n=1 experiments: explain why a single subject is sufficient (e.g., repeated measures, crossover design).
Item 14 (Allocation): Method of generating the random allocation sequence, type of randomisation (simple, block, stratified), and concealment mechanism. For self-experiments: use a random number generator or coin flip, and conceal the sequence until assignment.
Item 15 (Allocation concealment): Mechanism to prevent foreknowledge of intervention assignment. For self-experiments: have someone else prepare the allocation schedule.
Item 16 (Implementation): Who generates the allocation sequence, who enrols participants, and who assigns interventions. For n=1: you can generate your own sequence but should have someone else conceal it.
Item 17 (Blinding): Who is blinded (participants, care providers, outcome assessors) and how blinding is maintained. For self-experiments: use identical placebo, have someone else prepare doses, and avoid unblinding yourself.

Methods — Data Collection, Analysis, Monitoring (Items 18–22)

Item 18a (Data collection): Plans for assessment and collection of outcome, baseline, and other trial data, including instruments, timing, and procedures to promote data quality.
Item 20a (Statistical analysis): Statistical methods for analysing primary and secondary outcomes. For n=1: specify whether you'll use visual inspection, t-tests, effect size calculations, or Bayesian methods.
Item 21a (Data monitoring): Composition of data monitoring committee. For self-experiments: identify a trusted friend or colleague who can review your data and stop the experiment if harms emerge.

Ethics and Dissemination (Items 23–31)

Item 25 (Harms): Plans for collecting, assessing, reporting, and managing adverse events. For self-experiments: define what counts as a harm, how you'll record it, and stopping rules.
Item 26 (Auditing): Frequency and procedures for auditing trial conduct. For self-experiments: schedule regular reviews of your data and protocol adherence.
Item 29 (Data access): Statement on who will have access to the final dataset. For self-experiments: consider sharing your anonymised data publicly.
Item 30 (Ancillary care): Provisions for ancillary or post-trial care. For self-experiments: plan for what happens if you discover a health issue during the experiment.

Appendices (Items 32–33)

Item 32 (Consent): Model consent form. For self-experiments: write a consent document for yourself acknowledging risks.
Item 33 (Biological specimens): Plans for collection, storage, and future use of biological specimens. Relevant if you're doing blood draws, saliva samples, etc.

Effect magnitude

This is not applicable in the traditional sense — the SPIRIT paper does not report effect sizes. However, the practical effect of using the checklist can be estimated from prior research cited in the paper:

Incomplete protocols are common: Studies cited in the paper found that protocols often lack information on randomisation methods (missing in ~40–60% of protocols), blinding procedures (~30–50% missing), and sample size calculations (~20–40% missing).
Missing protocol information leads to biased reporting: When protocols are incomplete, trial reports are more likely to selectively report outcomes, change primary endpoints post-hoc, and omit harms.
The SPIRIT checklist aims to reduce these problems by forcing complete documentation before the trial starts.

For a self-experimenter, the "effect" is that using the checklist will likely:

Reduce the chance you forget to measure an important confound
Force you to pre-specify your primary outcome (reducing "p-hacking" or cherry-picking results)
Make your results more credible to others
Allow replication by other self-experimenters

Limitations

What the authors acknowledge

The checklist represents minimum content — additional items may be needed for specific trial designs (crossover, cluster, pragmatic trials)
Some items lack empirical evidence and were included based on "strong pragmatic or ethical rationale"
The checklist is not intended to prescribe how a trial should be designed — only what should be documented
The systematic review for empirical evidence was conducted up to 2009; newer evidence may exist

What a critical reader would note

No validation study: The checklist was never tested in a randomised trial to see if it actually improves protocol quality or reduces bias in trial results
Selection bias in examples: Model examples were chosen by the authors, not randomly sampled — they may represent best-case scenarios rather than typical protocols
Stakeholder representation: While 115 contributors is large, the paper does not report the demographic breakdown (e.g., how many were from low-income countries, how many were patient representatives)
Industry involvement: One author (Jesse A. Berlin) was from Janssen Research and Development (pharmaceutical company). The paper does not discuss how industry interests might have influenced the checklist
Length and complexity: The full checklist with explanation runs 42 pages — this may be overwhelming for a self-experimenter, though the core checklist itself is only 33 items
No guidance on prioritisation: All 33 items are presented as equally important, but some are clearly more critical for bias reduction (randomisation, blinding, pre-specified outcomes) than others (administrative details, consent form templates)

Practical takeaways

For someone running their own n=1 experiment, the SPIRIT checklist is a protocol template — not something to test, but something to use. Here's how to apply it:

What to do (not what to test)

Write a protocol document before starting your experiment that addresses these key SPIRIT items:

Title: "Single-subject, placebo-controlled, crossover trial of [intervention] for [outcome] in a healthy adult male/female aged [X]"
Background (Item 6): Write 2–3 paragraphs summarising what's known about your intervention, why you're testing it, and what gap you're filling
Objectives (Item 7): Write one specific, pre-registered hypothesis. Example: "I hypothesise that 200 mg of caffeine taken 30 minutes before a 5 km run will reduce my finish time by at least 30 seconds compared to placebo."
Trial design (Item 8): Specify crossover (most common for n=1), parallel (if you're comparing two conditions simultaneously), or single-group (least rigorous). State allocation ratio (1:1 for crossover).
Eligibility criteria (Item 10): Define your own baseline. Example: "Healthy male, age 32, no chronic conditions, non-smoker, no caffeine tolerance (≤1 cup coffee/day), no sleep disorders, no current medications."
Interventions (Item 11a): Specify exact dose, timing, route, duration. Example: "200 mg caffeine anhydrous capsule taken orally with 250 mL water at 07:00, 30 minutes before run. Placebo: identical capsule containing microcrystalline cellulose."
Outcomes (Item 12): Pre-specify ONE primary outcome. Example: "Primary: 5 km run time (seconds). Secondary: heart rate at 1 km intervals (bpm), rating of perceived exertion (Borg CR10 scale, 0–10), sleep quality that night (subjective 1–5 scale)."
Sample size (Item 13): For n=1 crossover, specify number of repetitions. Example: "10 pairs of intervention and placebo sessions (20 total runs), based on ability to detect a 30-second difference with 80% power assuming SD of 45 seconds."
Randomisation (Item 14): Use a random number generator (e.g., random.org) to create the sequence. Have someone else prepare opaque envelopes containing the assignment for each session.
Blinding (Item 17): Use identical capsules prepared by someone else. You should not know which is which until after analysis. If blinding is impossible (e.g., exercise intervention), use objective outcome measures.
Statistical analysis (Item 20a): Pre-specify your analysis. Example: "Paired t-test comparing mean run times between caffeine and placebo conditions. Effect size reported as Cohen's d. Significance threshold: p < 0.05."
Harms (Item 25): Define stopping rules. Example: "If I experience chest pain, severe anxiety, or insomnia lasting >2 nights, I will stop the experiment and unblind."

Minimum meaningful duration

For acute interventions (caffeine, supplements, short-term diet changes): At least 5–10 pairs of intervention/control sessions (10–20 total measurement days)
For chronic interventions (exercise programmes, meditation, long-term diet): At least 4–8 weeks per condition, with daily measurements
For crossover designs: Include a washout period of at least 5 half-lives of the intervention between conditions

Read full paper →More Finance →