Reinforcement Learning PPO Problem

21 views (last 30 days)
Hi, I work a continuous PPO code nowadays.However, I encountered the following problem with the setup of actor network. Is there anyone who can produce a solution? Thank you.
command window:
Caused by:
Layer 'mean&sdev': Input size mismatch. Size of input to this layer is different from the expected input size.
Inputs to this layer:
from layer 'scale' (output size 2)
from layer 'splus' (output size 2)
%%
% L = 100; % number of neurons
% statePath = [
% featureInputLayer(6,'Normalization','none','Name','observation')
% fullyConnectedLayer(L,'Name','fc1')
% reluLayer('Name','relu1')
% fullyConnectedLayer(L,'Name','fc2')
% additionLayer(2,'Name','add')
% reluLayer('Name','relu2')
% fullyConnectedLayer(L,'Name','fc3')
% reluLayer('Name','relu3')
% fullyConnectedLayer(1,'Name','fc4')];
%
% actionPath = [
% featureInputLayer(2,'Normalization','none','Name','action')
% fullyConnectedLayer(L,'Name','fc5')];
%%
L = 100;
criticNetwork = [
featureInputLayer(6,'Normalization','none','Name','observations')
fullyConnectedLayer(L,'Name','fc1')
reluLayer('Name','relu1')
fullyConnectedLayer(L,'Name','fc2')
reluLayer('Name','relu2')
fullyConnectedLayer(L,'Name','fc3')
reluLayer('Name','relu3')
fullyConnectedLayer(1,'Name','fc4')];
%%
% criticNetwork = layerGraph(statePath);
% criticNetwork = addLayers(criticNetwork,actionPath);
%
% criticNetwork = connectLayers(criticNetwork,'fc5','add/in2');
%% Create critic representation
useGPU = false;
if useGPU
criticOptions.UseDevice = 'gpu';
end
criticOptions = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1,'L2RegularizationFactor',1e-4);
critic = rlValueRepresentation(criticNetwork,observationInfo,...
'Observation',{'observations'},criticOptions);
%%
% input path layers (6 by 1 input and a 2 by 1 output)
inPath = [ featureInputLayer(6,'Normalization','none','Name','observations')
fullyConnectedLayer(L,'Name','fc1')
reluLayer('Name','relu1')
fullyConnectedLayer(L,'Name','fc2')
reluLayer('Name','relu2')
fullyConnectedLayer(2,'Name','fc3')];
% path layers for mean value (2 by 1 input and 2 by 1 output)
% using scalingLayer to scale the range
meanPath = [ tanhLayer('Name','tanh'); % output range: (-1,1)
scalingLayer('Name','scale','Scale',actionInfo.UpperLimit) ]; % output range: (-10,10)
% path layers for standard deviations (2 by 1 input and output)
% using softplus layer to make it non negative
sdevPath = softplusLayer('Name', 'splus');
% conctatenate two inputs (along dimension #3) to form a single (4 by 1) output layer
outLayer = concatenationLayer(3,2,'Name','mean&sdev');
% add layers to network object
actorNetwork = layerGraph(inPath);
actorNetwork = addLayers(actorNetwork,meanPath);
actorNetwork = addLayers(actorNetwork,sdevPath);
actorNetwork = addLayers(actorNetwork,outLayer);
% connect layers: the mean value path output MUST be connected to the FIRST input of the concatenationLayer
actorNetwork = connectLayers(actorNetwork,'fc3','tanh/in'); % connect output of inPath to meanPath input
actorNetwork = connectLayers(actorNetwork,'fc3','splus/in'); % connect output of inPath to sdevPath input
actorNetwork = connectLayers(actorNetwork,'scale','mean&sdev/in1'); % connect output of meanPath to gaussPars input #1
actorNetwork = connectLayers(actorNetwork,'splus','mean&sdev/in2');% connect output of sdevPath to gaussPars input #2
% plot network
plot(actorNetwork);
%%
% actorNetwork = [
% featureInputLayer(6,'Normalization','none','Name','observations')
% fullyConnectedLayer(L,'Name','fc1')
% reluLayer('Name','relu1')
% fullyConnectedLayer(L,'Name','fc2')
% reluLayer('Name','relu2')
% fullyConnectedLayer(L,'Name','fc3')
% softmaxLayer('Name','actionProb')];
actorOptions = rlRepresentationOptions('LearnRate',1e-4,'GradientThreshold',1,'L2RegularizationFactor',1e-4);
actor = rlStochasticActorRepresentation(actorNetwork,observationInfo,actionInfo,...
'Observation',{'observations'},actorOptions);
%Create actor representation
if useGPU
actorOptions.UseDevice = 'gpu';
end
agentOptions = rlPPOAgentOptions(...
'SampleTime',Tf,...
'ExperienceHorizon',200,...
'ClipFactor',0.2,...
'EntropyLossWeight',0.01,...
'NumEpoch',3,...
'AdvantageEstimateMethod',"gae",...
'GAEFactor',0.95,...
'DiscountFactor',0.99,...
'MiniBatchSize',64);
agentOptions.NoiseOptions.Variance = [0.6;0.1];
agentOptions.NoiseOptions.VarianceDecayRate = 1e-5;
%%
agent = rlPPOAgent(actor,critic,agentOptions);
%%
maxepisodes = 5000;
maxsteps = ceil(Ts/Tf); % Ts: simulation time, Tf: sampled time
trainingOpts = rlTrainingOptions(...
'MaxEpisodes',maxepisodes,...
'MaxStepsPerEpisode',maxsteps,...
'Verbose',true,...
'Plots','training-progress',...
'StopTrainingCriteria','AverageReward',...
'StopTrainingValue',300,...
'SaveAgentCriteria','EpisodeReward',...
'SaveAgentValue',100);
%'ScoreAveragingWindowLength',250,... number concetive episodes
% in workspace,save(opt.SaveAgentDirectory + "/finalAgent.mat",'agent')
%you can show in workspace for example;
%trainOpts =
% rlTrainingOptions with properties:
%
% MaxEpisodes: 1000
% MaxStepsPerEpisode: 1000
% ScoreAveragingWindowLength: 5
% StopTrainingCriteria: "AverageReward"
% StopTrainingValue: 480
% SaveAgentCriteria: "none"
% SaveAgentValue: "none"
% SaveAgentDirectory: "savedAgents"
%%
doTraining = false;
if doTraining
% Train the agent.
trainingStats = train(agent,env,trainingOpts);
else
% Load a pretrained agent for the example.
load('Trains/savedAgents_1/finalAgent','agent')
end
simOptions = rlSimulationOptions('MaxSteps',maxsteps);
experience = sim(env,agent,simOptions);
%%
function in = localResetFcn(in)
% reset
in = setVariable(in,'e1_initial', 0.5*(-1+2*rand)); % random value for lateral deviation
in = setVariable(in,'e2_initial', 0.1*(-1+2*rand)); % random value for relative yaw angle
end

Answers (1)

Emmanouil Tzorakoleftherakis
Edited: Emmanouil Tzorakoleftherakis on 8 Jan 2021
Hello,
Please take a look at how to create the actor and critic networks for continuous PPO here. It seems there is a dimension mismatch and following the doc example should help.
If you are using R2020b, there is a new feature that lets you create a PPO agent without creating the actor and critic neural networks - Reinforcement Learning Toolbox will create a default architecture for you that you can then modify as needed. Please take a look at this example to see how to implement this.
  2 Comments
onder kelevic
onder kelevic on 10 Jan 2021
Thank you your answer. I created a PPO agent without creating the actor and critic neural networks.but i still encountered a problem. the learning rate of actor is seen 0.01 in RL Episode Menager, but in my code and workspace it is 1e-4. I work Highway path following control and according to me 0.01 is small of converging. how can i change Actor learnRate in Episode Menager.Do you have any ideas?
actor.Options.LearnRate = 1e-4;

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!