For this example, consider a 5-by-5 grid world with these rules:
A 5-by-5 grid world bounded by borders, with four possible actions: North = 1, South = 2, East = 3, West = 4.
The agent begins from cell [2,1] (second row, first column, indicated by the red circle in the figure).
The agent receives reward +10 if it reaches the terminal state at cell [5,5] (blue cell).
The environment contains a special jump from cell [2,4] to cell [4,4] with +5 reward (blue arrow).
The agent is blocked by obstacles in cells [3,3], [3,4], [3,5], and [4,3] (black cells).
All other actions result in –1 reward.
First, create a GridWorld
object using the createGridWorld
function.
GW =
GridWorld with properties:
GridSize: [5 5]
CurrentState: "[1,1]"
States: [25×1 string]
Actions: [4×1 string]
T: [25×25×4 double]
R: [25×25×4 double]
ObstacleStates: [0×1 string]
TerminalStates: [0×1 string]
ProbabilityTolerance: 8.8818e-16
Then set the initial, terminal, and obstacle states.
Update the state transition matrix for the obstacle states and set the jump rule over the obstacle states.
Next, define the rewards in the reward transition matrix.
Now, use rlMDPEnv
to create a grid world environment using the GridWorld
object GW
.
env =
rlMDPEnv with properties:
Model: [1×1 rl.env.GridWorld]
ResetFcn: []
You can visualize the grid world environment using the plot
function.
Use the getActionInfo
and getObservationInfo
functions to extract the action and observation specification objects from the environment.
actInfo =
rlFiniteSetSpec with properties:
Elements: [4×1 double]
Name: "MDP Actions"
Description: [0×0 string]
Dimension: [1 1]
DataType: "double"
obsInfo =
rlFiniteSetSpec with properties:
Elements: [25×1 double]
Name: "MDP Observations"
Description: [0×0 string]
Dimension: [1 1]
DataType: "double"
You can now use the action and observation specifications to create an agent for your environment, and then use the train
and sim
functions to train and simulate the agent within the environment.