RL designer toolbox | PPO agent | NaN output as policy

3 views (last 30 days)
Dear all,
I'm using a PPO RL agent in a simulink environment and I'm training the RL agent using the RL Designer toolbox.
I'm getting the following error in the middle of the training at various episodes:
Error:Block '.../agent Obj/Evaluate Policy/Execute Policy/Enabled Policy Evaluator/Policy Evaluator/Policy Process Experience Internal' outputs 'NaN' for element 1 of output port 1 at major time step 0.
My agent's action is one numeric value which should be bounded between [-1, 1] so there shouldn't be any NaN values.
I'm using a rlContinuousGaussianActor that has a softplusLayer in the last layer of the standard deviation output and a tanhLayer followed by a scalingLayer with Scale=1 in the outputlayer of the mean value for the action. I have also used the following command:
actInfo = rlNumericSpec([1 1], 'LowerLimit', -1, 'UpperLimit', 1);
I'm resetting my environment at each episode using the following command:
simEnv.ResetFcn = @(in) setVariable(in,"q",0,"Workspace",mdl);
I'm sure there's no singularity in my model as the smulation runs with no error. The error only happens during the training with the toolbox and it also doesn't always happen.
I'd appreciate it if you could hel me solve this issue.
Thank you very much.

Answers (0)

Products


Release

R2024a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!