How to compute the gradient of deep Actor network in DRL (with regard to all of its parameters)?
8 views (last 30 days)
Show older comments
I’m now trying to train a Policy Network driving a self-learning agent.
In the following example of Matlab, I need to ask:
First, the “environment” associated with my research is indeed very complicated, and is far away from any of the “pre-defined” examples included in Matlab---How shall I define “obsInfo” and “actInfo”?
In the official document, they’re extracted from the default environment, which doesn’t work for my case:
obsInfo = getObservationInfo(env);
actInfo = getActionInfo(env);
In other words, whether it is necessary for me to define the environment of my own problem at first, if I want to use Matlab to conduct Deep Reinforcement Learning?
Second, in case that the enironment is too complex (or time-consuming) to define, how can I compute the gradient of the output of the policy network, with regards to each of its parameters (weights and bias), like done in the example--- It seems that they cannot be used, if I run the simulation without defining a RL environement (Relavant to the first question)?
% 6. Compute the gradient of the loss with respect to the policy parameters.
actorGradient = gradient(actor,'loss-parameters',...
{observationBatch},lossData);
% 7. Update the actor network using the computed gradients.
actor = optimize(actor,actorGradient);
2 Comments
Emmanouil Tzorakoleftherakis
on 8 Jan 2021
Edited: Emmanouil Tzorakoleftherakis
on 8 Jan 2021
Are you trying to implement a custom RL algorithm? It seems so otherwise you don't need to calculate gradients and run the optimization yourself (you can use some of the provided built-in algorithms that do that for you).
Answers (1)
Emmanouil Tzorakoleftherakis
on 16 Jan 2021
In the link you provide above, the gradients are calculated with the "gradient" function that uses automatic differentiation. So as long as you call this function properly, you should be all set.
Regarding the predefined environments, there are a lot of shipping examples that use "custom environments" as well. Sounds like your environment is in MATLAB (not Simulink), so I recommend taking a look at this example to see how the rocket environment is implemented.
Hope that helps!
1 Comment
Tesfay Gebrekidan
on 6 Mar 2021
I have read the options for the gradient in the following:
help rl.representation.rlAbstractRepresentation.gradient
open rl.representation.rlAbstractRepresentation.gradient
There are three options of computing gradients. loss to parameter, output to input, and output to parameter. How can I define a gradient for my custom DDPG training loop, which uses two types of gradietss: one from the critic output to the actor output and one from the critic ouput to the parameters.
The actor has multiple outputs. A screen shot of the matlab documentation for DDPG actor training is here.
See Also
Categories
Find more on Deep Learning Toolbox in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!