How to compute the gradient of deep Actor network in DRL (with regard to all of its parameters)?

8 views (last 30 days)
I’m now trying to train a Policy Network driving a self-learning agent.
In the following example of Matlab, I need to ask:
First, the “environment” associated with my research is indeed very complicated, and is far away from any of the “pre-defined” examples included in Matlab---How shall I define “obsInfo” and “actInfo”?
In the official document, they’re extracted from the default environment, which doesn’t work for my case:
obsInfo = getObservationInfo(env);
actInfo = getActionInfo(env);
In other words, whether it is necessary for me to define the environment of my own problem at first, if I want to use Matlab to conduct Deep Reinforcement Learning?
Second, in case that the enironment is too complex (or time-consuming) to define, how can I compute the gradient of the output of the policy network, with regards to each of its parameters (weights and bias), like done in the example--- It seems that they cannot be used, if I run the simulation without defining a RL environement (Relavant to the first question)?
% 6. Compute the gradient of the loss with respect to the policy parameters.
actorGradient = gradient(actor,'loss-parameters',...
{observationBatch},lossData);
% 7. Update the actor network using the computed gradients.
actor = optimize(actor,actorGradient);
  2 Comments
Emmanouil Tzorakoleftherakis
Edited: Emmanouil Tzorakoleftherakis on 8 Jan 2021
Are you trying to implement a custom RL algorithm? It seems so otherwise you don't need to calculate gradients and run the optimization yourself (you can use some of the provided built-in algorithms that do that for you).
Li Sun
Li Sun on 9 Jan 2021
Dear Emmanouil: Many thanks for your timely reply!!
Yes, you're exactly right---I'm now managing to implement a customized Deep Reinforcement Learning---It seems that all the examples included in the official documentation of Matlab are all run based on the pre-defined environment (e.g. the "cartpole").
Nevertheless, the problem I'm trying to solve now is rather complicated and stochastic, and thus hard to pre-define.
Therefore, for example, in case that I’m defining my policy network in the following way (almost the same like the official example):
actorNetwork = [featureInputLayer(6,'Normalization','none','Name','state')
fullyConnectedLayer(24,'Name','fc1')
reluLayer('Name','relu1')
fullyConnectedLayer(24,'Name','fc2')
reluLayer('Name','relu2')
fullyConnectedLayer(4,'Name','output')
softmaxLayer('Name','actionProb')];
lgraph = layerGraph(layers);
dlnet = dlnetwork(lgraph);
In order to update the policy, the gradient ascent (not descent) needs to be done.
To that end, the gradient of log P(a|s,θ) with regard to each of the weights and biases shall be computed.
My ultimate question is, whether and how those gradients can be calculated, WITHOUT any of the pre-defined environment?
Or, in other way, supposedly, I’m defining the policy network in the following (a bit more straightforward) way:
Policy_network=feedforwardnet([20 20]);
Whether or not, the gradient can be computed, regarding such a neural network?
Thanks again, for the very gracious help!

Sign in to comment.

Answers (1)

Emmanouil Tzorakoleftherakis
In the link you provide above, the gradients are calculated with the "gradient" function that uses automatic differentiation. So as long as you call this function properly, you should be all set.
Regarding the predefined environments, there are a lot of shipping examples that use "custom environments" as well. Sounds like your environment is in MATLAB (not Simulink), so I recommend taking a look at this example to see how the rocket environment is implemented.
Hope that helps!
  1 Comment
Tesfay Gebrekidan
Tesfay Gebrekidan on 6 Mar 2021
I have read the options for the gradient in the following:
help rl.representation.rlAbstractRepresentation.gradient
open rl.representation.rlAbstractRepresentation.gradient
There are three options of computing gradients. loss to parameter, output to input, and output to parameter. How can I define a gradient for my custom DDPG training loop, which uses two types of gradietss: one from the critic output to the actor output and one from the critic ouput to the parameters.
The actor has multiple outputs. A screen shot of the matlab documentation for DDPG actor training is here.

Sign in to comment.

Categories

Find more on Deep Learning Toolbox in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!