Error appears when setting the multi-dimensional actions in Matlab Environment (Reinforcement Learning Toolbox)

2 views (last 30 days)
As shown in the following codes, three actions, whose ranges are [0.1 10], [0.1 10] and [0 pi], respectively, are set:
%% main.m
%% Observation
ObservationInfo = rlNumericSpec([7 1]);
ObservationInfo.Name = 'Obstacle Avoidance States';
ObservationInfo.Description = 'delta_x, delta_y, delta_z, delta_L, delta_V, pusi, theta';
%% Action
ActionInfo = rlNumericSpec([3 1],'LowerLimit',[0.1 0.1 0]','UpperLimit',[10 10 pi]');
ActionInfo.Name = 'Action';
%% Environment
env = rlFunctionEnv(ObservationInfo,ActionInfo,'myStepFunction','myResetFunction');
rng(0);
InitialObs = reset(env);
Then, the actions are assigned three variables in the function myStepFunction (Action, LoggedSignals) as follows:
function [NextObs,Reward,IsDone,LoggedSignals] = myStepFunction(Action,LoggedSignals)
para_rho1 = Action(1);
para_rho2 = Action(2);
para_theta = Action(3);
......
end
Run the main.m, it is normal.
However, when running the following instruction, an error appears:
step(env,10)
Index exceeds the number of array elements (1)
Error: myStepFunction (line 23)
para_rho2 = Action(2);
Why does the dimension of the action is changed as 1? How to address this error?
If I set the variables para_rho2 and para_theta as constants, and change the dimension of the action as [1 1] in rlNumericSpec, then the instruction step(env,10) can be normally executed.

Answers (1)

Ryan Comeau
Ryan Comeau on 29 May 2020
Hello, so I've take a look at the rocket lander code environment which MATLAB gives as an example. What they do, it that every action is scaled between 0 and 1. The maximum values for the actions are then stored in your environment properties as a vector of values defining the min and max values. When we hop into the step function, the actions get scaled. I know this seems strange, and i'm not sure if it's the best approach but i'm not a veteran of RL. So, what you should do is the following:
%% main.m
%% Observation
ObservationInfo = rlNumericSpec([7 1]);
ObservationInfo.Name = 'Obstacle Avoidance States';
ObservationInfo.Description = 'delta_x, delta_y, delta_z, delta_L, delta_V, pusi, theta';
%% Action
ActionInfo = rlNumericSpec([3 1 1],'LowerLimit',0,'UpperLimit',1);
ActionInfo.Name = 'Action';
%stuff...
function [NextObs,Reward,IsDone,LoggedSignals] = myStepFunction(Action,LoggedSignals)
para_rho1 = Action(1).*env.borders(1); %so open rocket lander from MATLAB and take a look
para_rho2 = Action(2).*env.borders(2);
para_theta = Action(3).*env.borders(3);
......
end
Hope this helps
RC
  2 Comments
wujianfa93
wujianfa93 on 1 Jun 2020
Edited: wujianfa93 on 1 Jun 2020
Many thanks for your reply! But I think I have solved this problem. When I continued designing the corresponding DDPG training process and start training, I found that the whole programme can correctly run. So I guess the verification using the function step() may not be necessary.

Sign in to comment.

Products


Release

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!