Deep Surrogate Assisted Generation of Environments

Varun Bhatt*
University of Southern California
vsbhatt@usc.edu

Bryon Tjanaka*
University of Southern California
tjanaka@usc.edu

Matthew C. Fontaine*
University of Southern California
mfontain@usc.edu

Stefanos Nikolaidis
University of Southern California
nikolaid@usc.edu

Abstract

Recent progress in reinforcement learning (RL) has started producing generally capable agents that can solve a distribution of complex environments. These agents are typically tested on fixed, human-authored environments. On the other hand, quality diversity (QD) optimization has been proven to be an effective component of environment generation algorithms, which can generate collections of high-quality environments that are diverse in the resulting agent behaviors. However, these algorithms require potentially expensive simulations of agents on newly generated environments. We propose Deep Surrogate Assisted Generation of Environments (DSAGE), a sample-efficient QD environment generation algorithm that maintains a deep surrogate model for predicting agent behaviors in new environments. Results in two benchmark domains show that DSAGE significantly outperforms existing QD environment generation algorithms in discovering collections of environments that elicit diverse behaviors of a state-of-the-art RL agent and a planning agent.

This website displays videos associated with the figures in the paper. We have arranged the videos to match the figure layouts in the paper as closely as possible. Click on the videos to view them individually.

Figure 1

An overview of the Deep Surrogate Assisted Generation of Environments (DSAGE) algorithm. DSAGE exploits a deep surrogate model to fill an archive of solutions (blue arrows), which are then evaluated by simulating an agent (red arrows). The surrogate model is then trained on the data from the simulations (yellow arrows).

Figure 3

(a)

Number of wall cells: 43
Mean agent path length: MAX

(b)

Number of wall cells: 72
Mean agent path length: 610

(c)

Number of wall cells: 50
Mean agent path length: 81

(d)

Number of wall cells: 100
Mean agent path length: 297

(e)

Number of wall cells: 150
Mean agent path length: 369

(f)

Number of wall cells: 188
Mean agent path length: 636

(g)

Number of wall cells: 121
Mean agent path length: 200

(h)

Number of wall cells: 214
Mean agent path length: 8

Figure 4

(a)

Sky tiles: 40, Number of jumps: 40

(b)

Sky tiles: 1, Number of jumps: 3

(c)

Sky tiles: 100, Number of jumps: 50

(d)

Sky tiles: 140, Number of jumps: 6

Figure 11

(a)

Maze exploration: 40%
Repeated visits: 504

(b)

Maze exploration: 15%
Repeated visits: 632

(c)

Maze exploration: 51%
Repeated visits: 108

(d)

Maze exploration: 28%
Repeated visits: 6

(e)

Maze exploration: 97%
Repeated visits: 625

(f)

Maze exploration: 60%
Repeated visits: 402

(g)

Maze exploration: 96%
Repeated visits: 12

(h)

Maze exploration: 79%
Repeated visits: 192

Figure 12

(a)

Maze exploration: 39%
Number of wall cells: 155

(b)

Maze exploration: 55%
Number of wall cells: 213

(c)

Maze exploration: 12%
Number of wall cells: 60

(d)

Maze exploration: 17%
Number of wall cells: 48

(e)

Maze exploration: 92%
Number of wall cells: 100

(f)

Maze exploration: 97%
Number of wall cells: 100

(g)

Maze exploration: 97%
Number of wall cells: 43

(h)

Maze exploration: 58%
Number of wall cells: 42

Figure 13

(a)

Number of wall cells: 56
Repeated visits: 604

(b)

Number of wall cells: 100
Repeated visits: 401

(c)

Number of wall cells: 45
Repeated visits: 18

(d)

Number of wall cells: 75
Repeated visits: 149

(e)

Number of wall cells: 181
Repeated visits: 271

(f)

Number of wall cells: 202
Repeated visits: MAX

(g)

Number of wall cells: 150
Repeated visits: 77

(h)

Number of wall cells: 219
Repeated visits: 0

Figure 14

(a)

Sky tiles: 0, Enemies killed: 5

(b)

Sky tiles: 63, Enemies killed: 15

(c)

Sky tiles: 120, Enemies killed: 2

(d)

Sky tiles: 144, Enemies killed: 7

Figure 15

(a)

Number of jumps: 2, Enemies killed: 5

(b)

Number of jumps: 0, Enemies killed: 0

(c)

Number of jumps: 15, Enemies killed: 2

(d)

Number of jumps: 41, Enemies killed: 10

(e)

Number of jumps: 53, Enemies killed: 1