Main Content

generatePolicyFunction

Create function that evaluates trained policy of reinforcement learning agent

Description

example

generatePolicyFunction(agent) creates a function that evaluates the learned policy of the specified agent using the default function, policy, and data file names. After generating the policy evaluation function, you can:

example

generatePolicyFunction(agent,Name,Value) specifies the function, policy, and data file names using one or more name-value pair arguments.

Examples

collapse all

This example shows how to create a policy evaluation function for a PG Agent.

First, create and train a reinforcement learning agent. For this example, load the PG agent trained in Train PG Agent to Balance Cart-Pole System:

load('MATLABCartpolePG.mat','agent')

Then, create a policy evaluation function for this agent using default names:

generatePolicyFunction(agent);

This command creates the evaluatePolicy.m file, which contains the policy function, and the agentData.mat file, which contains the trained deep neural network actor.

View the generated function.

type evaluatePolicy.m
function action1 = evaluatePolicy(observation1)
%#codegen

% Reinforcement Learning Toolbox
% Generated on: 25-Aug-2020 17:27:49

actionSet = [-10 10];
% Select action from sampled probabilities
probabilities = localEvaluate(observation1);
% Normalize the probabilities
p = probabilities(:)'/sum(probabilities);
% Determine which action to take
edges = min([0 cumsum(p)],1);
edges(end) = 1;
[~,actionIndex] = histc(rand(1,1),edges); %#ok<HISTC>
action1 = actionSet(actionIndex);
end
%% Local Functions
function probabilities = localEvaluate(observation1)
persistent policy
if isempty(policy)
	policy = coder.loadDeepLearningNetwork('agentData.mat','policy');
end
observation1 = observation1(:);
probabilities = predict(policy,observation1);
end

For a given observation, the policy function evaluates a probability for each potential action using the actor network. Then, the policy function randomly selects an action based on these probabilities.

Since the actor network for this PG agent has a single input layer and single output layer, you can generate code for this network using the Deep Learning Toolbox™ generation functionality. For more information, see Deploy Trained Reinforcement Learning Policies.

This example shows how to create a policy evaluation function for a Q-Learning Agent.

For this example, load the Q-learning agent trained in Train Reinforcement Learning Agent in Basic Grid World

load('basicGWQAgent.mat','qAgent')

Create a policy evaluation function for this agent and specify the name of the agent data file.

generatePolicyFunction(qAgent,'MATFileName',"policyFile.mat")

This command creates the evaluatePolicy.m file, which contains the policy function, and the policyFile.mat file, which contains the trained Q table value function.

View the generated function.

type evaluatePolicy.m
function action1 = evaluatePolicy(observation1)
%#codegen

% Reinforcement Learning Toolbox
% Generated on: 25-Aug-2020 17:27:50

actionSet = [1;2;3;4];
numActions = numel(actionSet);
q = zeros(1,numActions);
for i = 1:numActions
	q(i) = localEvaluate(observation1,actionSet(i));
end
[~,actionIndex] = max(q);
action1 = actionSet(actionIndex);
end
%% Local Functions
function q = localEvaluate(observation1,action)
persistent policy
if isempty(policy)
	s = coder.load('policyFile.mat','policy');
	policy = s.policy;
end
actionSet = [1;2;3;4];
observationSet = [1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19;20;21;22;23;24;25];
actionIndex = rl.codegen.getElementIndex(actionSet,action);
observationIndex = rl.codegen.getElementIndex(observationSet,observation1);
q = policy(observationIndex,actionIndex);
end

For a given observation, the policy function looks up the value function for each potential action using the Q table. Then, the policy function selects the action for which the value function is greatest.

You can generate code for this policy function using MATLAB® Coder™

For more information, see Deploy Trained Reinforcement Learning Policies

Input Arguments

collapse all

Trained reinforcement learning agent, specified as one of the following:

Since Deep Learning Toolbox™ code generation and prediction functionality do not support deep neural networks with more than one input layer, generatePolicyFunction does not support the following agent configurations.

  • DQN agent with single-output deep neural network critic representations

  • Any agent with deep neural network actor or critic representations with multiple observation input layers

Note

DQN agents with a multi-output deep neural network representation are supported by generatePolicyFunction, provided that the network has only one input layer for the observations.

To train your agent, use the train function.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'FunctionName',"computeAction"

Name of the generated function, specified as the name-value pair consisting of 'FunctionName' and a string or character vector.

Name of the policy variable within the generated function, specified as the name-value pair consisting of 'PolicyName' and a string or character vector.

Name of the agent data file, specified as the name-value pair consisting of 'MATFileName' and a string or character vector.

Introduced in R2019a