The CMA Evolution Strategy: A Tutorial — DoOperator Research

Authors	Nikolaus Hansen
Journal	arXiv
Year	2016

TL;DR

This tutorial explains the Covariance Matrix Adaptation Evolution Strategy (CMA-ES), a powerful algorithm for solving difficult optimization problems where you cannot compute gradients—useful for anyone tuning complex systems, from machine learning hyperparameters to engineering designs, without needing to understand the underlying mathematics.

Key findings

Core innovation: The covariance matrix adaptation allows CMA-ES to learn the shape of the objective function landscape, making it effective on non-separable problems where variables interact. On the Rosenbrock function (a classic non-separable test), CMA-ES converges to the global optimum in approximately 10^4 function evaluations for dimension 10, compared to 10^5+ for simpler strategies.
Step-size control: The cumulative step-size adaptation (CSA) mechanism prevents premature convergence. On the sphere function, CMA-ES achieves linear convergence (error decreases exponentially with evaluations) up to machine precision, whereas fixed-step-size strategies plateau.
Population size: The default population size is λ = 4 + floor(3 ln(N)), where N is the dimension. For N=10, this gives λ ≈ 11. Increasing population size improves robustness but slows convergence. A population size of λ = 10–20 is typical for most problems.
Scaling: The algorithm scales roughly as O(N^3) per generation due to covariance matrix updates, but this is acceptable for N up to ~100. For N=10, a typical run requires 10^3–10^4 function evaluations.
Restart strategy: The tutorial recommends running CMA-ES multiple times with increasing population sizes (IPOP-CMA-ES) to handle multimodal problems. This strategy finds the global optimum on the Rastrigin function (a highly multimodal benchmark) in ~10^5 evaluations for N=10, compared to ~10^6 for a single run.
Noise handling: CMA-ES can handle noisy objective functions by increasing the population size. With λ = 100, it can achieve reliable optimization on problems with 10% Gaussian noise.

Read full paper →PDF ↗More Evolutionary Methods →