What's the difference between getAction and predict in RL and why does it change with agent and actor?

Question

Kevin Voogd on 20 Jul 2022

0
Link

Direct link to this question

https://uk.mathworks.com/matlabcentral/answers/1764215-what-s-the-difference-between-getaction-and-predict-in-rl-and-why-does-it-change-with-agent-and-acto

Answered: Ari Biswas on 26 Jan 2023

Hi all,

I am trying to import the neural network of my PPO actor via ONNX. I followed the steps shown in here Train DDPG Agent with Pretrained Actor Network (adapted to PPO, though). I do not import a critic for the network because my network is ready to be deployed. When I check the output of predict(....) it matches what I've in Python. However, getAction(agent,{testData}) and getAction(actor,{testData}) differ from predict(...) and even from themselves. Moreover, they change every run even if the input is kept constant (for example, feeding an array of ones). Can someone clarify me why the output of getAction changes when used with agent abd actor, and why does not match the value of the neural network?

Best regards,

Kevin

Here is the code used and a result obtained:

agentAction = -0.9091

actorAction = -0.8572

predictNN = 0.8436

actorNetwork  = importONNXNetwork("C:\...\ppo_model.onnx",'TargetNetwork',"dlnetwork", "InputDataFormats",'BC');
actorNetwork = layerGraph(actorNetwork);
low_limit = transpose([0.0 -pi -20000.0, -20000.0, -1.5, -20000, -20000, -2, -3, -3.5, -4]);
upper_limit = transpose([20.0, pi, 20000.0, 20000.0, 1.5, 20000, 20000, 2, 3, 3.5, 4]);
obsInfo = rlNumericSpec([11 1], 'LowerLimit',low_limit, 'UpperLimit',upper_limit);
actInfo = rlNumericSpec([1 1],'LowerLimit',-0.18,'UpperLimit',0.18);
% Code generation does not support the last custom layer, so delete it
actorNetwork = removeLayers(actorNetwork, 'onnx__Gemm_0_BatchSizeVerifier');
actorNetwork = removeLayers(actorNetwork, 'x25Output');
actorNetwork = removeLayers(actorNetwork, 'x26Output');
actorNetwork = connectLayers(actorNetwork, 'onnx__Gemm_0', 'Gemm_0');
% Get the names of the layers required to generate the actor
netMeanActName = actorNetwork.Layers(12).Name;
netStdActName = actorNetwork.Layers(13).Name;
netObsNames = actorNetwork.Layers(1).Name;
actor = rlContinuousGaussianActor(actorNetwork,obsInfo,actInfo,'ActionMeanOutputNames', netMeanActName, 'ActionStandardDeviationOutputNames', netStdActName, 'ObservationInputNames', netObsNames);
agent = rlPPOAgent(obsInfo, actInfo);
agent.setActor(actor)
% Check that the network used by supervisedActor is the same one that was loaded. To do so, evaluate both the network and the agent using the same random input observation.
testData = ones(11,1);
% Evaluate the actor
agentAction = getAction(agent,{testData})
actorAction = getAction(actor,{testData})
% Evaluate the agent's actor 
predictImNN = predict(getModel(getActor(agent)),dlarray(testData','BC'))

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Ari Biswas on 26 Jan 2023

0
Link

Direct link to this answer

https://uk.mathworks.com/matlabcentral/answers/1764215-what-s-the-difference-between-getaction-and-predict-in-rl-and-why-does-it-change-with-agent-and-acto#answer_1157175

The PPO agent with continuous action space has a stochastic policy. The network has two outputs: mean and standard deviation.

Calling getAction on the agent/actor returns the action sampled from the policy using the mean and stdev outputs of the network.

Calling predict on the network gives you the mean and std values. You should do [mean,std] = predict(...) instead to get both values.

Also, you must ensure that you are comparing from the same random number generator state. For e.g. ensure that you execute rng(0) before you evaluating the networks each time.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

What's the difference between getAction and predict in RL and why does it change with agent and actor?

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

0 Comments
Show -2 older commentsHide -2 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

What's the difference between getAction and predict in RL and why does it change with agent and actor?

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

0 Comments Show -2 older commentsHide -2 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments