← Research / Evolutionary Methods

The CMA Evolution Strategy: A Tutorial

PaperWikiEvolutionary MethodsUnrated
Read full paper →PDF ↗
AuthorsNikolaus Hansen
JournalarXiv
Year2016

TL;DR

This tutorial explains the Covariance Matrix Adaptation Evolution Strategy (CMA-ES), a powerful algorithm for solving difficult optimization problems where you cannot compute gradients—useful for anyone tuning complex systems, from machine learning hyperparameters to engineering designs, without needing to understand the underlying mathematics.

Key findings

  • Core innovation: The covariance matrix adaptation allows CMA-ES to learn the shape of the objective function landscape, making it effective on non-separable problems where variables interact. On the Rosenbrock function (a classic non-separable test), CMA-ES converges to the global optimum in approximately 10^4 function evaluations for dimension 10, compared to 10^5+ for simpler strategies.
  • Step-size control: The cumulative step-size adaptation (CSA) mechanism prevents premature convergence. On the sphere function, CMA-ES achieves linear convergence (error decreases exponentially with evaluations) up to machine precision, whereas fixed-step-size strategies plateau.
  • Population size: The default population size is λ = 4 + floor(3 ln(N)), where N is the dimension. For N=10, this gives λ ≈ 11. Increasing population size improves robustness but slows convergence. A population size of λ = 10–20 is typical for most problems.
  • Scaling: The algorithm scales roughly as O(N^3) per generation due to covariance matrix updates, but this is acceptable for N up to ~100. For N=10, a typical run requires 10^3–10^4 function evaluations.
  • Restart strategy: The tutorial recommends running CMA-ES multiple times with increasing population sizes (IPOP-CMA-ES) to handle multimodal problems. This strategy finds the global optimum on the Rastrigin function (a highly multimodal benchmark) in ~10^5 evaluations for N=10, compared to ~10^6 for a single run.
  • Noise handling: CMA-ES can handle noisy objective functions by increasing the population size. With λ = 100, it can achieve reliable optimization on problems with 10% Gaussian noise.
Read full paper →PDF ↗More Evolutionary Methods

Related papers

Paper

Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning

Felipe Petroski Such, Vashisht Madhavan, Edoardo Conti +3 more · 2017

Paper

Evolution Strategies as a Scalable Alternative to Reinforcement Learning

Tim Salimans, Jonathan Ho, Xi Chen +1 more · 2017

Paper

The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities

J. Lehman, J. Clune, D. Misevic +50 more · 2018

Paper

Population Based Training of Neural Networks

Max Jaderberg, Valentin Dalibard, Simon Osindero +9 more · 2017