Back to Modules
intermediateComponents

Controller: Evolution Strategies

Learn how the controller is trained using CMA-ES evolution strategies.

30 min read
2 references

Controller: Evolution Strategies

Overview

The Controller in World Models is a simple linear model that maps the combined representation (z, h) to actions.

Why a Simple Controller?

The key insight is that most of the "intelligence" is captured by the World Model:

  • The VAE extracts relevant visual features
  • The MDN-RNN learns environment dynamics

The controller only needs to map compressed representations to actions.

Evolution Strategies

Instead of gradient-based optimization, the controller is trained using Covariance Matrix Adaptation Evolution Strategy (CMA-ES).

Why Evolution Strategies?

  1. Gradient-free: No need for differentiable reward
  2. Parallelizable: Evaluate many candidates simultaneously
  3. Global optimization: Less likely to get stuck in local minima

Training in Dreams

A remarkable feature of World Models is dream training:

  1. Collect real data: Random rollouts in the actual environment
  2. Train V and M: Learn the World Model
  3. Train C in dreams: Use MDN-RNN to simulate experiences

The controller trained entirely in dreams achieves near-optimal performance!

References
Academic papers and resources

The CMA Evolution Strategy: A Tutorial

Nikolaus Hansen (2016)

paper

Evolution Strategies as a Scalable Alternative to Reinforcement Learning

Tim Salimans et al. (2017)

paper