What's the difference between getAction and predict in RL and why does it change with agent and actor?

4 views (last 30 days)
Hi all,
I am trying to import the neural network of my PPO actor via ONNX. I followed the steps shown in here Train DDPG Agent with Pretrained Actor Network (adapted to PPO, though). I do not import a critic for the network because my network is ready to be deployed. When I check the output of predict(....) it matches what I've in Python. However, getAction(agent,{testData}) and getAction(actor,{testData}) differ from predict(...) and even from themselves. Moreover, they change every run even if the input is kept constant (for example, feeding an array of ones). Can someone clarify me why the output of getAction changes when used with agent abd actor, and why does not match the value of the neural network?
Best regards,
Kevin
Here is the code used and a result obtained:
agentAction = -0.9091
actorAction = -0.8572
predictNN = 0.8436
actorNetwork = importONNXNetwork("C:\...\ppo_model.onnx",'TargetNetwork',"dlnetwork", "InputDataFormats",'BC');
actorNetwork = layerGraph(actorNetwork);
low_limit = transpose([0.0 -pi -20000.0, -20000.0, -1.5, -20000, -20000, -2, -3, -3.5, -4]);
upper_limit = transpose([20.0, pi, 20000.0, 20000.0, 1.5, 20000, 20000, 2, 3, 3.5, 4]);
obsInfo = rlNumericSpec([11 1], 'LowerLimit',low_limit, 'UpperLimit',upper_limit);
actInfo = rlNumericSpec([1 1],'LowerLimit',-0.18,'UpperLimit',0.18);
% Code generation does not support the last custom layer, so delete it
actorNetwork = removeLayers(actorNetwork, 'onnx__Gemm_0_BatchSizeVerifier');
actorNetwork = removeLayers(actorNetwork, 'x25Output');
actorNetwork = removeLayers(actorNetwork, 'x26Output');
actorNetwork = connectLayers(actorNetwork, 'onnx__Gemm_0', 'Gemm_0');
% Get the names of the layers required to generate the actor
netMeanActName = actorNetwork.Layers(12).Name;
netStdActName = actorNetwork.Layers(13).Name;
netObsNames = actorNetwork.Layers(1).Name;
actor = rlContinuousGaussianActor(actorNetwork,obsInfo,actInfo,'ActionMeanOutputNames', netMeanActName, 'ActionStandardDeviationOutputNames', netStdActName, 'ObservationInputNames', netObsNames);
agent = rlPPOAgent(obsInfo, actInfo);
agent.setActor(actor)
% Check that the network used by supervisedActor is the same one that was loaded. To do so, evaluate both the network and the agent using the same random input observation.
testData = ones(11,1);
% Evaluate the actor
agentAction = getAction(agent,{testData})
actorAction = getAction(actor,{testData})
% Evaluate the agent's actor
predictImNN = predict(getModel(getActor(agent)),dlarray(testData','BC'))

Accepted Answer

Ari Biswas
Ari Biswas on 26 Jan 2023
The PPO agent with continuous action space has a stochastic policy. The network has two outputs: mean and standard deviation.
Calling getAction on the agent/actor returns the action sampled from the policy using the mean and stdev outputs of the network.
Calling predict on the network gives you the mean and std values. You should do [mean,std] = predict(...) instead to get both values.
Also, you must ensure that you are comparing from the same random number generator state. For e.g. ensure that you execute rng(0) before you evaluating the networks each time.

More Answers (0)

Categories

Find more on Policies and Value Functions in Help Center and File Exchange

Products


Release

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!