createGridWorld

Create a two-dimensional grid world object

Syntax

GW = createGridWorld(m,n)

GW = createGridWorld(m,n,moves)

Description

A grid world is a two-dimensional grid in which each position is a possible state that the agent can occupy. The actions that an agent can attempt represent moves from one position to the next. Many introductory reinforcement learning examples use grid worlds. Use the createGridWorld function to create a GridWorld object with a specified size and move types. You can then modify some of the object properties and pass it to rlMDPEnv to create an environment that agents can interact with. For more information, see Create Custom Grid World Environments.

GW = createGridWorld(m,n) creates a grid world GW of size m-by-n with default actions of ['N';'S';'E';'W'].

example

GW = createGridWorld(m,n,moves) creates a grid world GW of size m-by-n with actions specified by moves.

Examples

collapse all

Create Grid World Environment

Open Live Script

For this example, consider a 5-by-5 grid world with these rules:

A 5-by-5 grid world bounded by borders, with four possible actions: North = 1, South = 2, East = 3, West = 4.
The agent begins from cell [2,1] (second row, first column, indicated by the red circle in the figure).
The agent receives reward +10 if it reaches the terminal state at cell [5,5] (blue cell).
The environment contains a special jump from cell [2,4] to cell [4,4] with +5 reward (blue arrow).
The agent is blocked by obstacles in cells [3,3], [3,4], [3,5], and [4,3] (black cells).
All other actions result in –1 reward.

Basic five-by-five grid world with agent (indicated by a red circle) positioned on the top left corner, terminal location (indicated by a light blue square) in the bottom right corner, and four obstacle squares, in black, in the middle.

First, create a GridWorld object using the createGridWorld function.

GW = createGridWorld(5,5)

GW = 
  GridWorld with properties:

                GridSize: [5 5]
            CurrentState: "[1,1]"
                  States: [25×1 string]
                 Actions: [4×1 string]
                       T: [25×25×4 double]
                       R: [25×25×4 double]
          ObstacleStates: [0×1 string]
          TerminalStates: [0×1 string]
    ProbabilityTolerance: 8.8818e-16

Then set the initial, terminal, and obstacle states.

GW.CurrentState = "[2,1]";
GW.TerminalStates = "[5,5]";
GW.ObstacleStates = ["[3,3]";"[3,4]";"[3,5]";"[4,3]"];

Update the state transition matrix for the obstacle states and set the jump rule over the obstacle states.

updateStateTranstionForObstacles(GW)
GW.T(state2idx(GW,"[2,4]"),:,:) = 0;
GW.T(state2idx(GW,"[2,4]"),state2idx(GW,"[4,4]"),:) = 1;

Next, define the rewards in the reward transition matrix.

nS = numel(GW.States);
nA = numel(GW.Actions);
GW.R = -1*ones(nS,nS,nA);
GW.R(state2idx(GW,"[2,4]"),state2idx(GW,"[4,4]"),:) = 5;
GW.R(:,state2idx(GW,GW.TerminalStates),:) = 10;

Now, use rlMDPEnv to create a grid world environment using the GridWorld object GW.

env = rlMDPEnv(GW)

env = 
  rlMDPEnv with properties:

       Model: [1×1 rl.env.GridWorld]
    ResetFcn: []

You can visualize the grid world environment using the plot function.

plot(env)

Figure contains an axes object. The hidden axes object contains 7 objects of type line, patch.

Use the getActionInfo and getObservationInfo functions to extract the action and observation specification objects from the environment.

actInfo = getActionInfo(env)

actInfo = 
  rlFiniteSetSpec with properties:

       Elements: [4×1 double]
           Name: "MDP Actions"
    Description: [0×0 string]
      Dimension: [1 1]
       DataType: "double"

obsInfo = getObservationInfo(env)

obsInfo = 
  rlFiniteSetSpec with properties:

       Elements: [25×1 double]
           Name: "MDP Observations"
    Description: [0×0 string]
      Dimension: [1 1]
       DataType: "double"

You can now use the action and observation specifications to create an agent for your environment, and then use the train and sim functions to train and simulate the agent within the environment.

Input Arguments

collapse all

`m` — Number of grid world rows
positive integer

Number of grid world rows, specified as a positive integer.

Example: 5

`n` — Number grid world columns
positive integer

Number of grid world columns, specified as a positive integer.

Example: 5

`moves` — Action names
`"Standard"` (default) | `"Kings"`

Action names, specified as either "Standard" or "Kings".

When moves is set to "Standard", the actions are ["N";"S";"E";"W"].
When moves is set to "Kings", the actions are ["N";"S";"E";"W";"NE";"NW";"SE";"SW"].

Example: "Kings"

Output Arguments

collapse all

`GW` — Two-dimensional grid world
`GridWorld` object

Two-dimensional grid world, returned as a GridWorld object with the properties listed below. For more information, see Create Custom Grid World Environments.

`GridSize` — Size of grid world
row vector

Size of the grid world, specified as a row vector. The first element is the number of grid rows, m, and the second element is the number of grid columns, n. This property is read-only.

Example: GW.GridSize=[5 5]

`CurrentState` — Name of current state
string

Name of the current state, specified as a string. This name corresponds to the current agent position in the grid, which is specified as a string or character vector such as "[a,b]".

For more information on this property, see the CurrentState property in Create Custom Grid World Environments.

Example: GW.CurrentState="[2,3]"

`States` — State names
string vector

State names, specified as a string vector of length m*n. Each state name is a string specified as indicated in CurrentState. This property is read-only.

For more information on this property, see the States property in Create Custom Grid World Environments.

`Actions` — Action names
string vector

Action names, specified as a string vector. This property is read-only.

The length of the Actions vector is determined by the moves argument.

Actions is a string vector of length:

Four, if moves is specified as "Standard"
Eight, if moves is specified as "Kings"

For more information on this property, see the Actions property in Create Custom Grid World Environments.

`T` — State transition matrix
3-D array

State transition matrix, specified as a 3-D array in which every row of each page contains nonnegative numbers that must add up to one.

The state transition matrix T is a probability matrix that indicates the likelihood of the agent moving from the current state s to any possible next state s' by performing action a. T is given by

$T (s, s', a) = p r o b a b i l i t y (s' | s, a)$

where

T is a K-by-K-by-4 array if moves is specified as "Standard". Here, K = m*n.
T is a K-by-K-by-8 array, if moves is specified as "Kings".

When you create a grid world object, the transition matrix contains standard deterministic transitions corresponding to the four or eight actions that the agent can execute.

For more information on this property, see the T property in Create Custom Grid World Environments.

`R` — Reward transition matrix
3-D array

Reward transition matrix, specified as a 3-D array, determines how much reward the agent receives after performing an action in the environment. R has the same shape and size as the state transition matrix T. The reward transition matrix R is given by,

$r = R (s, s', a) .$

where

R is a K-by-K-by-4 array, if moves is specified as "Standard". Here, K = m*n.
R is a K-by-K-by-8 array, if moves is specified as "Kings".

When you create a grid world object, the reward matrix is zero.

For more information on this property, see the R property in Create Custom Grid World Environments.

`ObstacleStates` — State names that cannot be reached in the grid world
string vector

State names that cannot be reached in the grid world, specified as a string vector.

For more information on this property, see the ObstacleStates property in Create Custom Grid World Environments.

Example: GW.ObstacleStates = ["[3,3]";"[3,4]";"[3,5]";"[4,3]"];

`TerminalStates` — Terminal state names in the grid world
string vector

Terminal state names in the grid world, specified as a string vector.

For more information on this property, see the TerminalStates property in Create Custom Grid World Environments.

Example: GW.TerminalStates = "[5,5]";

Version History

Introduced in R2019a

createGridWorld

Syntax

Description

Examples

Create Grid World Environment

Input Arguments

`m` — Number of grid world rows
positive integer

`n` — Number grid world columns
positive integer

`moves` — Action names
`"Standard"` (default) | `"Kings"`

Output Arguments

`GW` — Two-dimensional grid world
`GridWorld` object

`GridSize` — Size of grid world
row vector

`CurrentState` — Name of current state
string

`States` — State names
string vector

`Actions` — Action names
string vector

`T` — State transition matrix
3-D array

`R` — Reward transition matrix
3-D array

`ObstacleStates` — State names that cannot be reached in the grid world
string vector

`TerminalStates` — Terminal state names in the grid world
string vector

Version History

See Also

Functions

Objects

Topics

createGridWorld

Syntax

Description

Examples

Create Grid World Environment

Input Arguments

m — Number of grid world rows positive integer

n — Number grid world columns positive integer

moves — Action names "Standard" (default) | "Kings"

Output Arguments

GW — Two-dimensional grid world GridWorld object

GridSize — Size of grid world row vector

CurrentState — Name of current state string

States — State names string vector

Actions — Action names string vector

T — State transition matrix 3-D array

R — Reward transition matrix 3-D array

ObstacleStates — State names that cannot be reached in the grid world string vector

TerminalStates — Terminal state names in the grid world string vector

Version History

See Also

Functions

Objects

Topics

`m` — Number of grid world rows
positive integer

`n` — Number grid world columns
positive integer

`moves` — Action names
`"Standard"` (default) | `"Kings"`

`GW` — Two-dimensional grid world
`GridWorld` object

`GridSize` — Size of grid world
row vector

`CurrentState` — Name of current state
string

`States` — State names
string vector

`Actions` — Action names
string vector

`T` — State transition matrix
3-D array

`R` — Reward transition matrix
3-D array

`ObstacleStates` — State names that cannot be reached in the grid world
string vector

`TerminalStates` — Terminal state names in the grid world
string vector