createMDP
Create Markov decision process model
Description
Examples
Create MDP Model
Create an MDP model with eight states and two possible actions.
MDP = createMDP(8,["up";"down"]);
Specify the state transitions and their associated rewards.
% State 1 Transition and Reward MDP.T(1,2,1) = 1; MDP.R(1,2,1) = 3; MDP.T(1,3,2) = 1; MDP.R(1,3,2) = 1; % State 2 Transition and Reward MDP.T(2,4,1) = 1; MDP.R(2,4,1) = 2; MDP.T(2,5,2) = 1; MDP.R(2,5,2) = 1; % State 3 Transition and Reward MDP.T(3,5,1) = 1; MDP.R(3,5,1) = 2; MDP.T(3,6,2) = 1; MDP.R(3,6,2) = 4; % State 4 Transition and Reward MDP.T(4,7,1) = 1; MDP.R(4,7,1) = 3; MDP.T(4,8,2) = 1; MDP.R(4,8,2) = 2; % State 5 Transition and Reward MDP.T(5,7,1) = 1; MDP.R(5,7,1) = 1; MDP.T(5,8,2) = 1; MDP.R(5,8,2) = 9; % State 6 Transition and Reward MDP.T(6,7,1) = 1; MDP.R(6,7,1) = 5; MDP.T(6,8,2) = 1; MDP.R(6,8,2) = 1; % State 7 Transition and Reward MDP.T(7,7,1) = 1; MDP.R(7,7,1) = 0; MDP.T(7,7,2) = 1; MDP.R(7,7,2) = 0; % State 8 Transition and Reward MDP.T(8,8,1) = 1; MDP.R(8,8,1) = 0; MDP.T(8,8,2) = 1; MDP.R(8,8,2) = 0;
Specify the terminal states of the model.
MDP.TerminalStates = ["s7";"s8"];
Input Arguments
states
— Model states
positive integer | string vector
Model states, specified as one of the following:
Positive integer — Specify the number of model states. In this case, each state has a default name, such as
"s1"
for the first state.String vector — Specify the state names. In this case, the total number of states is equal to the length of the vector.
actions
— Model actions
positive integer | string vector
Model actions, specified as one of the following:
Positive integer — Specify the number of model actions. In this case, each action has a default name, such as
"a1"
for the first action.String vector — Specify the action names. In this case, the total number of actions is equal to the length of the vector.
Output Arguments
MDP
— MDP model
GenericMDP
object
MDP model, returned as a GenericMDP
object with the following
properties.
CurrentState
— Name of the current state
string
Name of the current state, specified as a string.
States
— State names
string vector
State names, specified as a string vector with length equal to the number of states.
Actions
— Action names
string vector
Action names, specified as a string vector with length equal to the number of actions.
T
— State transition matrix
3D array
State transition matrix, specified as a 3-D array, which determines the
possible movements of the agent in an environment. State transition matrix
T
is a probability matrix that indicates how likely the agent
will move from the current state s
to any possible next state
s'
by performing action a
.
T
is an
S-by-S-by-A array,
where S is the number of states and A is the
number of actions. It is given by:
The sum of the transition probabilities out from a nonterminal state
s
following a given action must sum up to one. Therefore, all
stochastic transitions out of a given state must be specified at the same
time.
For example, to indicate that in state 1
following action
4
there is an equal probability of moving to states
2
or 3
, use the
following:
MDP.T(1,[2 3],4) = [0.5 0.5];
You can also specify that, following an action, there is some probability of remaining in the same state. For example:
MDP.T(1,[1 2 3 4],1) = [0.25 0.25 0.25 0.25];
R
— Reward transition matrix
3D array
Reward transition matrix, specified as a 3-D array, which determines how much
reward the agent receives after performing an action in the environment.
R
has the same shape and size as state transition matrix
T
. The reward for moving from state s
to
state s'
by performing action a
is given by:
TerminalStates
— Terminal state names in the grid world
string vector
Terminal state names in the grid world, specified as a string vector of state names.
Version History
Introduced in R2019a
See Also
Functions
Objects
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)