DDPG with LSTM layer fails?

Hi, I am trying to train a DDPG model. My env is based in simulink and works fine when I have only feed forward layers in my network. But as soon as I add a LSTM layer, I get this error of not enough arguments, I am using matlab 2023a and asumed that this supports LSTM layers in a DDPG network.
Could someone tell me what is going on?
Thanks :)
Code:
%% H2DF DDPG Trainer
%
%
% clc
% clear all
% close all
ObsInfo.Name = "Engine Outputs";
ObsInfo.Description = ' IMEP, NOX, SOOT, MPRR ';
%% Creating envirement
obsInfo = rlNumericSpec([8 1],...
'LowerLimit',[-inf -inf -inf -inf -inf -inf -inf -inf ]',...
'UpperLimit',[inf inf inf inf inf inf inf inf]');
rlNumericSpec requires Reinforcement Learning Toolbox.
obsInfo.Name = "Engine Outputs";
obsInfo.Description = ' IMEP, NOX, SOOT, MPRR ';
numObservations = obsInfo.Dimension(1);
version = '1123_002_GRU';
Data = struct2table(load(['VSR', version, '_post.mat']));
Data.label = string(Data.label);
ind = boolean(sum(Data.label == C2C_NMPC.Labels.outputs.', 2));
outputs_mean = [Data.mean{boolean(ind)}].';
outputs_std = [Data.std{ind}].';
ind = boolean(sum(Data.label == C2C_NMPC.Labels.controls.', 2));
controls_mean = [Data.mean{ind}].';
controls_std = [Data.std{ind}].';
lower_limit_controls = [0.17e-3;350;-2;1e-3];
upper_limit_controls = [0.5e-3;900;3;5.5e-3];
lower_limit_controls_norm = (lower_limit_controls - controls_mean)./controls_std;
upper_limit_controls_norm = (upper_limit_controls - controls_mean)./controls_std;
Ts = 0.01;
Tf = 10;
variance_normalised = ([1e-5*0.0025*(80/sqrt(Ts));1e2*0.003*(12/sqrt(Ts));30*0.003*(0.4/sqrt(Ts));1e-1*0.00025*(0.4/sqrt(Ts))] - controls_mean)./controls_std;
actInfo = rlNumericSpec([4 1],'LowerLimit',lower_limit_controls_norm,'UpperLimit',upper_limit_controls_norm);
actInfo.Name = "Engine Inputs";
actInfo.Description = 'DOI, P2M, SOI, DOI_H2';
numActions = actInfo.Dimension(1);
env.ResetFcn = @(in)localResetFcn(in);
env = rlSimulinkEnv('MPC_RL_H2DF','MPC_RL_H2DF/RL Agent',...
obsInfo,actInfo);
% 375 engine cycle results
rng(0)
% 1200 - 0.1| 1900: 0.06
%% Createing Agent
L = 60; % number of neurons
statePath = [
featureInputLayer(numObservations, 'Normalization', 'none', 'Name', 'observation')
fullyConnectedLayer(L, 'Name', 'fc1')
reluLayer('Name', 'relu1')
fullyConnectedLayer(L, 'Name', 'fc11')
reluLayer('Name', 'relu11')
% fullyConnectedLayer(L, 'Name', 'fc12')
% reluLayer('Name', 'relu12')
lstmLayer(2,"OutputMode","sequence")
fullyConnectedLayer(L, 'Name', 'fc15')
reluLayer('Name', 'relu15')
fullyConnectedLayer(L, 'Name', 'fc2')
additionLayer(2,'Name','add')
reluLayer('Name','relu2')
% fullyConnectedLayer(L, 'Name', 'fc3')
% reluLayer('Name','relu3')
% fullyConnectedLayer(L, 'Name', 'fc7')
% reluLayer('Name','relu7')
fullyConnectedLayer(1, 'Name', 'fc4','BiasInitializer','ones','WeightsInitializer','he')];
actionPath = [
featureInputLayer(numActions, 'Normalization', 'none', 'Name', 'action')
fullyConnectedLayer(L, 'Name', 'fc6')
reluLayer('Name','relu6')
fullyConnectedLayer(L, 'Name', 'fc13')
reluLayer('Name','relu13')
fullyConnectedLayer(L, 'Name', 'fc14')
reluLayer('Name','relu14')
fullyConnectedLayer(L, 'Name', 'fc5','BiasInitializer','ones','WeightsInitializer','he')];
criticNetwork = layerGraph(statePath);
criticNetwork = addLayers(criticNetwork, actionPath);
criticNetwork = connectLayers(criticNetwork,'fc5','add/in2');
figure
plot(criticNetwork)
criticOptions = rlRepresentationOptions('LearnRate',1e-4,'GradientThreshold',1);
critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,...
'Observation',{'observation'},'Action',{'action'},criticOptions);
%%
actorNetwork = [
featureInputLayer(numObservations, 'Normalization', 'none', 'Name', 'observation')
fullyConnectedLayer(L, 'Name', 'fc1')
reluLayer('Name', 'relu1')
fullyConnectedLayer(L, 'Name', 'fc2')
reluLayer('Name', 'relu2')
fullyConnectedLayer(L, 'Name', 'fc3')
reluLayer('Name', 'relu3')
fullyConnectedLayer(L, 'Name', 'fc8')
reluLayer('Name', 'relu8')
fullyConnectedLayer(L, 'Name', 'fc9')
reluLayer('Name', 'relu9')
fullyConnectedLayer(L, 'Name', 'fc10')
reluLayer('Name', 'relu10')
fullyConnectedLayer(numActions, 'Name', 'fc4')
tanhLayer('Name','tanh1')
scalingLayer('Name','ActorScaling1','Scale',-(actInfo.UpperLimit-actInfo.LowerLimit)/2,'Bias',(actInfo.UpperLimit+actInfo.LowerLimit)/2)];
actorOptions = rlRepresentationOptions('LearnRate',1e-4,'GradientThreshold',1);
actor = rlDeterministicActorRepresentation(actorNetwork,obsInfo,actInfo,...
'Observation',{'observation'},'Action',{'ActorScaling1'},actorOptions);
%% Deep Deterministic Policy Gradient (DDPG) agent
agentOpts = rlDDPGAgentOptions(...
'SampleTime',Ts,...
'TargetSmoothFactor',1e-3,...
'DiscountFactor',0.99, ...
'MiniBatchSize',1024, ...
'ExperienceBufferLength',1e7);
% agentOpts.NoiseOptions.Variance =
% [0.005*(70/sqrt(Ts));0.005*(12/sqrt(Ts));0.005*(0.4/sqrt(Ts))] v01
agentOpts.NoiseOptions.Variance = [1e-5*0.0025*(80/sqrt(Ts));1e2*0.003*(12/sqrt(Ts));30*0.003*(0.4/sqrt(Ts));1e-1*0.00025*(0.4/sqrt(Ts))];
agentOpts.NoiseOptions.Variance = variance_normalised;
agentOpts.NoiseOptions.VarianceDecayRate = [1e-6;1e-6;1e-6;1e-6];
% agent = rlDDPGAgent(actor,critic,agentOpts);
% variance*ts^2 = (0.01 - 0.1)*(action range)
% At each sample time step, the noise model is updated using the following formula, where Ts is the agent sample time.
%
% x(k) = x(k-1) + MeanAttractionConstant.*(Mean - x(k-1)).*Ts
% + Variance.*randn(size(Mean)).*sqrt(Ts)
% At each sample time step, the variance decays as shown in the following code.
%
% decayedVariance = Variance.*(1 - VarianceDecayRate);
% Variance = max(decayedVariance,VarianceMin);
% For continuous action signals, it is important to set the noise variance appropriately to encourage exploration. It is common to have Variance*sqrt(Ts) be between 1% and 10% of your action range.
%
% If your agent converges on local optima too quickly, promote agent exploration by increasing the amount of noise; that is, by increasing the variance. Also, to increase exploration, you can reduce the VarianceDecayRate.
%% Training agent
maxepisodes = 500;
maxsteps = ceil(Tf/Ts);
trainOpts = rlTrainingOptions(...
'MaxEpisodes',maxepisodes, ...
'MaxStepsPerEpisode',maxsteps, ...
'ScoreAveragingWindowLength',100, ...
'Verbose',false, ...
'UseParallel',false,...
'Plots','training-progress',...
'StopTrainingCriteria','AverageReward',...
'StopTrainingValue',0,...
'SaveAgentCriteria','EpisodeReward','SaveAgentValue',-50');
%%
% % Set to true, to resume training from a saved agent
resumeTraining = false;
% % Set ResetExperienceBufferBeforeTraining to false to keep experience from the previous session
agentOpts.ResetExperienceBufferBeforeTraining = ~(resumeTraining);
if resumeTraining
% Load the agent from the previous session
sprintf('- Resume training of: %s', 'agentV04.mat');
trainedfile = load('D:\Masters\HiWi\h2dfannbasedmpc\acados_implementation\rl\savedAgents\Agent1620.mat','saved_agent');
agent =trainedfile.saved_agent;
else
% Create a fresh new agent
agent = rlDDPGAgent(actor, critic, agentOpts);
end
% agent = rlDDPGAgent(actor, critic, agentOpts);
% agent = rlDDPGAgent(actor,critic,agentOpts);
%% Train the agent
trainingStats = train(agent, env, trainOpts);
trainingStats = train(agent,env,trainOpts);
% get the agent's actor, which predicts next action given the current observation
actor = getActor(agent);
% get the actor's parameters (neural network weights)
%actorParams = getLearnableParameterValues(actor);
And the error:
Error using rl.train.SeriesTrainer/run
There was an error executing the ProcessExperienceFcn for block "MPC_RL_H2DF/RL Agent".
Caused by:
Error using rl.function.AbstractFunction/evaluate
Unable to evaluate function model.
Error in rl.function.rlQValueFunction/getValue (line 74)
[qValue, state, batchSize, sequenceLength] = evaluate(this, [observation; action]);
Error in rl.agent.rlDDPGAgent/criticLearn_ (line 359)
targetQ = getValue(this.TargetCritic_,miniBatch.NextObservation,nextActions);
Error in rl.agent.rlDDPGAgent/learnFromBatchData_ (line 325)
[criticGradient, criticLoss] = criticLearn_(this, minibatch, maskIdx,sampleIdx,weights);
Error in rl.agent.AbstractOffPolicyAgent/learnFromBatchData (line 76)
[this, learnData] = learnFromBatchData_(this,batchData,maskIdx, Idx, Weights);
Error in rl.agent.rlDDPGAgent/learnFromExperiencesInMemory_ (line 307)
[~, learnData] = learnFromBatchData(this,minibatch,maskIdx, sampleIdx, weights);
Error in rl.agent.mixin.InternalMemoryTrainable/learnFromExperiencesInMemory (line 32)
learnFromExperiencesInMemory_(this);
Error in rl.agent.AbstractOffPolicyAgent/learn_ (line 104)
learnFromExperiencesInMemory(this);
Error in rl.agent.AbstractAgent/learn (line 29)
this = learn_(this,experience);
Error in rl.util.agentProcessStepExperience (line 6)
learn(Agent,Exp);
Error in rl.env.internal.FunctionHandlePolicyExperienceProcessor/processExperience_ (line 31)
[this.Policy_,this.Data_] = feval(this.Fcn_,...
Error in rl.env.internal.ExperienceProcessorInterface/processExperienceInternal_ (line 139)
processExperience_(this,experience,infoData);
Error in rl.env.internal.ExperienceProcessorInterface/processExperience (line 78)
stopsim = processExperienceInternal_(this,experience,simTime);
Error in rl.simulink.blocks.PolicyProcessExperience/stepImpl (line 45)
stopsim = processExperience(this.ExperienceProcessor_,experience,simTime);
Error in Simulink.Simulation.internal.DesktopSimHelper
Error in Simulink.Simulation.internal.DesktopSimHelper.sim
Error in Simulink.SimulationInput/sim
Error in rl.env.internal.SimulinkSimulator>localSim (line 259)
simout = sim(in);
Error in rl.env.internal.SimulinkSimulator>@(in)localSim(in,simPkg) (line 171)
simfcn = @(in) localSim(in,simPkg);
Error in MultiSim.internal.runSingleSim
Error in MultiSim.internal.SimulationRunnerSerial/executeImplSingle
Error in MultiSim.internal.SimulationRunnerSerial/executeImpl
Error in Simulink.SimulationManager/executeSims
Error in Simulink.SimulationManagerEngine/executeSims
Error in rl.env.internal.SimulinkSimulator/simInternal_ (line 172)
simInfo = executeSims(engine,simfcn,getSimulationInput(this));
Error in rl.env.internal.SimulinkSimulator/sim_ (line 78)
out = simInternal_(this,simPkg);
Error in rl.env.internal.AbstractSimulator/sim (line 30)
out = sim_(this,simData,policy,processExpFcn,processExpData);
Error in rl.env.AbstractEnv/runEpisode (line 144)
out = sim(simulator,simData,policy,processExpFcn,processExpData);
Error in rl.train.SeriesTrainer/run (line 59)
out = runEpisode(...
Error in rl.train.TrainingManager/train (line 479)
run(trainer);
Error in rl.train.TrainingManager/run (line 233)
train(this);
Error in rl.agent.AbstractAgent/train (line 136)
trainingResult = run(trainMgr,checkpoint);
Error in MPC_RL_lstm_run_pureR_H2DF (line 195)
trainingStats = train(agent, env, trainOpts);
Caused by:
Not enough input arguments.
Error in rl.train.TrainingManager/train (line 479)
run(trainer);
Error in rl.train.TrainingManager/run (line 233)
train(this);
Error in rl.agent.AbstractAgent/train (line 136)
trainingResult = run(trainMgr,checkpoint);
Error in MPC_RL_lstm_run_pureR_H2DF (line 195)
trainingStats = train(agent, env, trainOpts);

 Accepted Answer

Hello,
I see a couple of things wrong with the current architecture (could be more):
1) When you use the lstm layer, the input layer should be a sequence layer, not a feature input layer
2) The lstm layer should be used both for the actor and for the critic.
I think the easiest way for you to figure out a correct architecture is to use the default agent feature initially. You can then take the generated architecture and fine-tune it for your specific applications. See for example here. Make sure to specify that you want an rnn network in the agent initialization options.
Hope that helps

5 Comments

Thanks for the answer!
I am trying to replicate the performance by a simple feed forward network now. But still run into the same error.
Could you please comment?
TIA :)
%% H2DF DDPG Trainer
%
% Author- Vasu Sharma
% clc
% clear all
% close all
ObsInfo.Name = "Engine Outputs";
ObsInfo.Description = ' IMEP, NOX, SOOT, MPRR ';
%% Creating envirement
obsInfo = rlNumericSpec([16 1],...
'LowerLimit',[-inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf]',...
'UpperLimit',[inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf]');
obsInfo.Name = "Engine Outputs";
obsInfo.Description = ' IMEP, NOX, SOOT, MPRR, IMEP_t-1,IMEP_ref,IMEP_ref_t-1, IMEP_error, states';
numObservations = obsInfo.Dimension(1);
actInfo = rlNumericSpec([4 1],'LowerLimit',[0.17e-3;440;-1;1e-3],'UpperLimit',[0.5e-3;440;0;5.5e-3]);
actInfo.Name = "Engine Inputs";
actInfo.Description = 'DOI, P2M, SOI, DOI_H2';
numActions = actInfo.Dimension(1);
env = rlSimulinkEnv('MPC_RL_H2DF','MPC_RL_H2DF/RL Agent',...
obsInfo,actInfo);
env.ResetFcn = @(in)localResetFcn(in);
Ts = 0.08;
Tf = 20;
% 375 engine cycle results
rng(0)
% 1200 - 0.1| 1900: 0.06
%% Createing Agent
L = 60; % number of neurons
statePath = [
sequenceInputLayer(numObservations, 'Normalization', 'none', 'Name', 'observation')
fullyConnectedLayer(L, 'Name', 'fc1')
reluLayer('Name', 'relu1')
fullyConnectedLayer(L, 'Name', 'fc11')
reluLayer('Name', 'relu11')
fullyConnectedLayer(L, 'Name', 'fc12')
reluLayer('Name', 'relu12')
fullyConnectedLayer(L, 'Name', 'fc15')
reluLayer('Name', 'relu15')
fullyConnectedLayer(L, 'Name', 'fc2')
additionLayer(2,'Name','add')
reluLayer('Name','relu2')
fullyConnectedLayer(L, 'Name', 'fc3')
reluLayer('Name','relu3')
fullyConnectedLayer(L, 'Name', 'fc18')
reluLayer('Name','relu18')
fullyConnectedLayer(L, 'Name', 'fc19')
reluLayer('Name','relu19')
fullyConnectedLayer(1, 'Name', 'fc4')];
actionPath = [
sequenceInputLayer(numActions, 'Normalization', 'none', 'Name', 'action')
fullyConnectedLayer(L, 'Name', 'fc6')
reluLayer('Name','relu6')
fullyConnectedLayer(L, 'Name', 'fc13')
reluLayer('Name','relu13')
fullyConnectedLayer(L, 'Name', 'fc14')
reluLayer('Name','relu14')
fullyConnectedLayer(L, 'Name', 'fc5')];
criticNetwork = layerGraph(statePath);
criticNetwork = addLayers(criticNetwork, actionPath);
criticNetwork = connectLayers(criticNetwork,'fc5','add/in2');
figure
plot(criticNetwork)
criticOptions = rlRepresentationOptions('LearnRate',1e-4,'GradientThreshold',1);
critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,...
'Observation',{'observation'},'Action',{'action'},criticOptions);
%%
actorNetwork = [
sequenceInputLayer(numObservations, 'Normalization', 'none', 'Name', 'observation')
fullyConnectedLayer(L, 'Name', 'fc1')
reluLayer('Name', 'relu1')
fullyConnectedLayer(L, 'Name', 'fc2')
reluLayer('Name', 'relu2')
fullyConnectedLayer(L, 'Name', 'fc3')
reluLayer('Name', 'relu3')
fullyConnectedLayer(L, 'Name', 'fc9')
reluLayer('Name', 'relu9')
fullyConnectedLayer(L, 'Name', 'fc10')
reluLayer('Name', 'relu10')
fullyConnectedLayer(numActions, 'Name', 'fc4')
tanhLayer('Name','tanh1')
scalingLayer('Name','ActorScaling1','Scale',-(actInfo.UpperLimit-actInfo.LowerLimit)/2,'Bias',(actInfo.UpperLimit+actInfo.LowerLimit)/2)];
actorOptions = rlRepresentationOptions('LearnRate',1e-4,'GradientThreshold',1);
actor = rlDeterministicActorRepresentation(actorNetwork,obsInfo,actInfo,...
'Observation',{'observation'},'Action',{'ActorScaling1'},actorOptions);
%% Deep Deterministic Policy Gradient (DDPG) agent
agentOpts = rlDDPGAgentOptions(...
'SampleTime',Ts,...
'TargetSmoothFactor',1e-3,...
'DiscountFactor',0.99, ...,
'MiniBatchSize',256, ...
'SequenceLength',32,...
'ExperienceBufferLength',1e5, ...
'TargetUpdateFrequency', 10);
% agentOpts.NoiseOptions.Variance =
% [0.005*(70/sqrt(Ts));0.005*(12/sqrt(Ts));0.005*(0.4/sqrt(Ts))] v01
agentOpts.NoiseOptions.Variance = [1e-5*0.0025*(80/sqrt(Ts));1e2*0.003*(12/sqrt(Ts));30*0.003*(0.4/sqrt(Ts));1e-1*0.00025*(0.4/sqrt(Ts))];
agentOpts.NoiseOptions.Variance =20*[1.65000000000000e-05;0;0;0.000225000000000000];
agentOpts.NoiseOptions.VarianceDecayRate = [1e-5;1e-5;1e-5;1e-5];
criticOptions.UseDevice = "gpu";
actorOptions.UseDevice = "gpu";
% agent = rlDDPGAgent(actor,critic,agentOpts);
% variance*ts^2 = (0.01 - 0.1)*(action range)
% At each sample time step, the noise model is updated using the following formula, where Ts is the agent sample time.
%
% x(k) = x(k-1) + MeanAttractionConstant.*(Mean - x(k-1)).*Ts
% + Variance.*randn(size(Mean)).*sqrt(Ts)
% At each sample time step, the variance decays as shown in the following code.
%
% decayedVariance = Variance.*(1 - VarianceDecayRate);
% Variance = max(decayedVariance,VarianceMin);
% For continuous action signals, it is important to set the noise variance appropriately to encourage exploration. It is common to have Variance*sqrt(Ts) be between 1% and 10% of your action range.
%
% If your agent converges on local optima too quickly, promote agent exploration by increasing the amount of noise; that is, by increasing the variance. Also, to increase exploration, you can reduce the VarianceDecayRate.
%% Training agent
maxepisodes = 10000;
maxsteps = ceil(Tf/Ts);
trainOpts = rlTrainingOptions(...
'MaxEpisodes',maxepisodes, ...
'MaxStepsPerEpisode',maxsteps, ...
'ScoreAveragingWindowLength',100, ...
'Verbose',true, ...
'UseParallel',false,...
'Plots','training-progress',...
'StopTrainingCriteria','AverageReward',...
'StopTrainingValue',0,...
'SaveAgentCriteria','EpisodeReward','SaveAgentValue',-0.1');
%%
% % Set to true, to resume training from a saved agent
resumeTraining = false;
% % Set ResetExperienceBufferBeforeTraining to false to keep experience from the previous session
agentOpts.ResetExperienceBufferBeforeTraining = ~(resumeTraining);
if resumeTraining
% Load the agent from the previous session
sprintf('- Resume training of: %s', 'agentV04.mat');
trained_agent = load('D:\Masters\HiWi\h2dfannbasedmpc\acados_implementation\rl\savedAgents\Agent253.mat');
agent = trained_agent.saved_agent ;
else
% Create a fresh new agent
agent = rlDDPGAgent(actor, critic, agentOpts);
end
% agent = rlDDPGAgent(actor, critic, agentOpts);
% agent = rlDDPGAgent(actor,critic,agentOpts);
%% Train the agent
trainingStats = train(agent, env, trainOpts);
%trainingStats = train(agent,env,trainOpts);
% get the agent's actor, which predicts next action given the current observation
actor = getActor(agent);
% get the actor's parameters (neural network weights)
%actorParams = getLearnableParameterValues(actor);
Error Message:
Error using rl.train.SeriesTrainer/run
There was an error executing the ProcessExperienceFcn for block "MPC_RL_H2DF/RL Agent".
Caused by:
Error using rl.function.AbstractFunction/evaluate
Unable to evaluate function model.
Error in rl.function.rlContinuousDeterministicActor/getAction_ (line 32)
[action, state] = evaluate(this, observation);
Error in rl.function.AbstractActorFunction/getAction (line 79)
[action, state] = getAction_(this, observation);
Error in rl.policy.rlAdditiveNoisePolicy/getAction_ (line 129)
[action,state] = getAction(this.Actor,observation);
Error in rl.policy.PolicyInterface/getAction (line 36)
[action,this] = getAction_(this,observation);
Error in rl.agent.AbstractOffPolicyAgent/getExplorationAction_ (line 116)
[action,this.ExplorationPolicy_] = getAction(this.ExplorationPolicy_,...
Error in rl.agent.AbstractAgent/getAction_ (line 90)
[action,this] = getExplorationAction_(this,observation);
Error in rl.policy.PolicyInterface/getAction (line 36)
[action,this] = getAction_(this,observation);
Error in rl.env.internal.PolicyExperienceProcessorInterface/evaluateAction_ (line 32)
[action,this.Policy_] = getAction(this.Policy_,observation);
Error in rl.env.internal.ExperienceProcessorInterface/evaluateAction (line 62)
action = evaluateAction_(this,observation);
Error in rl.simulink.blocks.PolicyProcessExperience/stepImpl (line 56)
act_sig = evaluateAction(this.ExperienceProcessor_,experience.NextObservation);
Error in Simulink.Simulation.internal.DesktopSimHelper
Error in Simulink.Simulation.internal.DesktopSimHelper.sim
Error in Simulink.SimulationInput/sim
Error in rl.env.internal.SimulinkSimulator>localSim (line 259)
simout = sim(in);
Error in rl.env.internal.SimulinkSimulator>@(in)localSim(in,simPkg) (line 171)
simfcn = @(in) localSim(in,simPkg);
Error in MultiSim.internal.runSingleSim
Error in MultiSim.internal.SimulationRunnerSerial/executeImplSingle
Error in MultiSim.internal.SimulationRunnerSerial/executeImpl
Error in Simulink.SimulationManager/executeSims
Error in Simulink.SimulationManagerEngine/executeSims
Error in rl.env.internal.SimulinkSimulator/simInternal_ (line 172)
simInfo = executeSims(engine,simfcn,getSimulationInput(this));
Error in rl.env.internal.SimulinkSimulator/sim_ (line 78)
out = simInternal_(this,simPkg);
Error in rl.env.internal.AbstractSimulator/sim (line 30)
out = sim_(this,simData,policy,processExpFcn,processExpData);
Error in rl.env.AbstractEnv/runEpisode (line 144)
out = sim(simulator,simData,policy,processExpFcn,processExpData);
Error in rl.train.SeriesTrainer/run (line 59)
out = runEpisode(...
Error in rl.train.TrainingManager/train (line 479)
run(trainer);
Error in rl.train.TrainingManager/run (line 233)
train(this);
Error in rl.agent.AbstractAgent/train (line 136)
trainingResult = run(trainMgr,checkpoint);
Error in MPC_RL_lstm_run_pureR_H2DF (line 180)
trainingStats = train(agent, env, trainOpts);
Caused by:
Brace indexing is not supported for variables of this type.
Error in rl.train.TrainingManager/train (line 479)
run(trainer);
Error in rl.train.TrainingManager/run (line 233)
train(this);
Error in rl.agent.AbstractAgent/train (line 136)
trainingResult = run(trainMgr,checkpoint);
Error in MPC_RL_lstm_run_pureR_H2DF (line 180)
trainingStats = train(agent, env, trainOpts);
Hi,
The answer I provided above is still relevant. The sequenceinputLayer has to be used with an lstm layer. If you have a feedforward net, use the feature input layer as your first layer.
As I mentioned above, if you use the default agent feature, you will can avoid these types of errors.
Vasu Sharma
Vasu Sharma on 31 Jan 2024
Edited: Vasu Sharma on 31 Jan 2024
Thanks a lot for the previous inputs, I understood the mistakes I was making.
I have a more conceptual question to my problem. I am trying to learn an Engine control model with a DDPG agent, whee I have an LSTM Model for my Engine as a plant.
I am trying to train the DDPG agent by asking it to follow a reference load trajectory as below ( dashed line in top left graph ). I have observed that despite trying various network architectures/noise options & learning rates, the learnt model agent chooses to just deliver a constant load of around 6 ( orange line in the top left graph), rather than follow the given refernece trajectory. The outputs seem to vary reasonably ( here in blue ) but the learning is still not acceptable.
I am tweaking the trajectory every episode to aid learning as then it can see varios load profiles.
Could you kindly advise what might be going on here?
Additional Information: The same effect happens if I ask the controller to match a constant load trajectory ( constnat per episode, then changes to another random constant for the next episode ).
Thanks in advance :)
Best,
Vasu
Can you please post this as a separate question? Nested questions are not easy to discover.
Thanks

Sign in to comment.

More Answers (1)

Gagan Agarwal
Gagan Agarwal on 21 Dec 2023
Hi Vasu,
I understand that you are encountering the error while implementing the provided code in MATLAB. LSTM layers are supported in DDPG network of MATLAB.
To address the error, consider the following suggestions:
  1. Confirm if the Reinforcement Learning toolbox is installed in your MATLAB environment.
  2. Review the training options and agent options to ensure they are configured correctly.
  3. Verify that the observation and action specifications align with the input requirements of your network architectures.
I hope it helps!

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!