How to compute the gradient of deep Actor network in DRL (with regard to all of its parameters)?

Question

Li Sun on 6 Jan 2021

0
Link

Direct link to this question

https://uk.mathworks.com/matlabcentral/answers/710498-how-to-compute-the-gradient-of-deep-actor-network-in-drl-with-regard-to-all-of-its-parameters

Commented: Syed Adil Ahmed on 13 Aug 2024

I’m now trying to train a Policy Network driving a self-learning agent.

In the following example of Matlab, I need to ask:

https://www.mathworks.com/help/reinforcement-learning/ug/train-reinforcement-learning-policy-using-custom-training.html

First, the “environment” associated with my research is indeed very complicated, and is far away from any of the “pre-defined” examples included in Matlab---How shall I define “obsInfo” and “actInfo”?

In the official document, they’re extracted from the default environment, which doesn’t work for my case:

obsInfo = getObservationInfo(env);

actInfo = getActionInfo(env);

In other words, whether it is necessary for me to define the environment of my own problem at first, if I want to use Matlab to conduct Deep Reinforcement Learning?

Second, in case that the enironment is too complex (or time-consuming) to define, how can I compute the gradient of the output of the policy network, with regards to each of its parameters (weights and bias), like done in the example--- It seems that they cannot be used, if I run the simulation without defining a RL environement (Relavant to the first question)?

% 6. Compute the gradient of the loss with respect to the policy parameters.

actorGradient = gradient(actor,'loss-parameters',...

{observationBatch},lossData);

% 7. Update the actor network using the computed gradients.

actor = optimize(actor,actorGradient);

2 Comments
Show NoneHide None

Emmanouil Tzorakoleftherakis on 8 Jan 2021

Edited: Emmanouil Tzorakoleftherakis on 8 Jan 2021

Are you trying to implement a custom RL algorithm? It seems so otherwise you don't need to calculate gradients and run the optimization yourself (you can use some of the provided built-in algorithms that do that for you).

Li Sun on 9 Jan 2021

Dear Emmanouil: Many thanks for your timely reply!!

Yes, you're exactly right---I'm now managing to implement a customized Deep Reinforcement Learning---It seems that all the examples included in the official documentation of Matlab are all run based on the pre-defined environment (e.g. the "cartpole").

Nevertheless, the problem I'm trying to solve now is rather complicated and stochastic, and thus hard to pre-define.

Therefore, for example, in case that I’m defining my policy network in the following way (almost the same like the official example):

actorNetwork = [featureInputLayer(6,'Normalization','none','Name','state')

fullyConnectedLayer(24,'Name','fc1')

reluLayer('Name','relu1')

fullyConnectedLayer(24,'Name','fc2')

reluLayer('Name','relu2')

fullyConnectedLayer(4,'Name','output')

softmaxLayer('Name','actionProb')];

lgraph = layerGraph(layers);

dlnet = dlnetwork(lgraph);

In order to update the policy, the gradient ascent (not descent) needs to be done.

To that end, the gradient of log P(a|s,θ) with regard to each of the weights and biases shall be computed.

My ultimate question is, whether and how those gradients can be calculated, WITHOUT any of the pre-defined environment?

Or, in other way, supposedly, I’m defining the policy network in the following (a bit more straightforward) way:

Policy_network=feedforwardnet([20 20]);

Whether or not, the gradient can be computed, regarding such a neural network?

Thanks again, for the very gracious help!

Sign in to comment.

Sign in to answer this question.

Answer 1

Emmanouil Tzorakoleftherakis on 16 Jan 2021

0
Link

Direct link to this answer

https://uk.mathworks.com/matlabcentral/answers/710498-how-to-compute-the-gradient-of-deep-actor-network-in-drl-with-regard-to-all-of-its-parameters#answer_599505

In the link you provide above, the gradients are calculated with the "gradient" function that uses automatic differentiation. So as long as you call this function properly, you should be all set.

Regarding the predefined environments, there are a lot of shipping examples that use "custom environments" as well. Sounds like your environment is in MATLAB (not Simulink), so I recommend taking a look at this example to see how the rocket environment is implemented.

Hope that helps!

2 Comments
Show NoneHide None

Tesfay Gebrekidan on 6 Mar 2021

I have read the options for the gradient in the following:

help rl.representation.rlAbstractRepresentation.gradient

open rl.representation.rlAbstractRepresentation.gradient

There are three options of computing gradients. loss to parameter, output to input, and output to parameter. How can I define a gradient for my custom DDPG training loop, which uses two types of gradietss: one from the critic output to the actor output and one from the critic ouput to the parameters.

The actor has multiple outputs. A screen shot of the matlab documentation for DDPG actor training is here.

Syed Adil Ahmed on 13 Aug 2024

Hey.

Were you able to figure this out, as to what function to use ? Is it dlgradient or gradient function ?

Sign in to comment.

How to compute the gradient of deep Actor network in DRL (with regard to all of its parameters)?

2 Comments
Show NoneHide None

Answers (1)

2 Comments
Show NoneHide None

See Also

Categories

Tags

Community Treasure Hunt

How to compute the gradient of deep Actor network in DRL (with regard to all of its parameters)?

2 Comments Show NoneHide None

Answers (1)

2 Comments Show NoneHide None

See Also

Categories

Tags

Community Treasure Hunt

2 Comments
Show NoneHide None

2 Comments
Show NoneHide None