Reinforcement Learning PPO Problem
21 views (last 30 days)
Show older comments
Hi, I work a continuous PPO code nowadays.However, I encountered the following problem with the setup of actor network. Is there anyone who can produce a solution? Thank you.
command window:
Caused by:
Layer 'mean&sdev': Input size mismatch. Size of input to this layer is different from the expected input size.
Inputs to this layer:
from layer 'scale' (output size 2)
from layer 'splus' (output size 2)
%%
% L = 100; % number of neurons
% statePath = [
% featureInputLayer(6,'Normalization','none','Name','observation')
% fullyConnectedLayer(L,'Name','fc1')
% reluLayer('Name','relu1')
% fullyConnectedLayer(L,'Name','fc2')
% additionLayer(2,'Name','add')
% reluLayer('Name','relu2')
% fullyConnectedLayer(L,'Name','fc3')
% reluLayer('Name','relu3')
% fullyConnectedLayer(1,'Name','fc4')];
%
% actionPath = [
% featureInputLayer(2,'Normalization','none','Name','action')
% fullyConnectedLayer(L,'Name','fc5')];
%%
L = 100;
criticNetwork = [
featureInputLayer(6,'Normalization','none','Name','observations')
fullyConnectedLayer(L,'Name','fc1')
reluLayer('Name','relu1')
fullyConnectedLayer(L,'Name','fc2')
reluLayer('Name','relu2')
fullyConnectedLayer(L,'Name','fc3')
reluLayer('Name','relu3')
fullyConnectedLayer(1,'Name','fc4')];
%%
% criticNetwork = layerGraph(statePath);
% criticNetwork = addLayers(criticNetwork,actionPath);
%
% criticNetwork = connectLayers(criticNetwork,'fc5','add/in2');
%% Create critic representation
useGPU = false;
if useGPU
criticOptions.UseDevice = 'gpu';
end
criticOptions = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1,'L2RegularizationFactor',1e-4);
critic = rlValueRepresentation(criticNetwork,observationInfo,...
'Observation',{'observations'},criticOptions);
%%
% input path layers (6 by 1 input and a 2 by 1 output)
inPath = [ featureInputLayer(6,'Normalization','none','Name','observations')
fullyConnectedLayer(L,'Name','fc1')
reluLayer('Name','relu1')
fullyConnectedLayer(L,'Name','fc2')
reluLayer('Name','relu2')
fullyConnectedLayer(2,'Name','fc3')];
% path layers for mean value (2 by 1 input and 2 by 1 output)
% using scalingLayer to scale the range
meanPath = [ tanhLayer('Name','tanh'); % output range: (-1,1)
scalingLayer('Name','scale','Scale',actionInfo.UpperLimit) ]; % output range: (-10,10)
% path layers for standard deviations (2 by 1 input and output)
% using softplus layer to make it non negative
sdevPath = softplusLayer('Name', 'splus');
% conctatenate two inputs (along dimension #3) to form a single (4 by 1) output layer
outLayer = concatenationLayer(3,2,'Name','mean&sdev');
% add layers to network object
actorNetwork = layerGraph(inPath);
actorNetwork = addLayers(actorNetwork,meanPath);
actorNetwork = addLayers(actorNetwork,sdevPath);
actorNetwork = addLayers(actorNetwork,outLayer);
% connect layers: the mean value path output MUST be connected to the FIRST input of the concatenationLayer
actorNetwork = connectLayers(actorNetwork,'fc3','tanh/in'); % connect output of inPath to meanPath input
actorNetwork = connectLayers(actorNetwork,'fc3','splus/in'); % connect output of inPath to sdevPath input
actorNetwork = connectLayers(actorNetwork,'scale','mean&sdev/in1'); % connect output of meanPath to gaussPars input #1
actorNetwork = connectLayers(actorNetwork,'splus','mean&sdev/in2');% connect output of sdevPath to gaussPars input #2
% plot network
plot(actorNetwork);
%%
% actorNetwork = [
% featureInputLayer(6,'Normalization','none','Name','observations')
% fullyConnectedLayer(L,'Name','fc1')
% reluLayer('Name','relu1')
% fullyConnectedLayer(L,'Name','fc2')
% reluLayer('Name','relu2')
% fullyConnectedLayer(L,'Name','fc3')
% softmaxLayer('Name','actionProb')];
actorOptions = rlRepresentationOptions('LearnRate',1e-4,'GradientThreshold',1,'L2RegularizationFactor',1e-4);
actor = rlStochasticActorRepresentation(actorNetwork,observationInfo,actionInfo,...
'Observation',{'observations'},actorOptions);
%Create actor representation
if useGPU
actorOptions.UseDevice = 'gpu';
end
agentOptions = rlPPOAgentOptions(...
'SampleTime',Tf,...
'ExperienceHorizon',200,...
'ClipFactor',0.2,...
'EntropyLossWeight',0.01,...
'NumEpoch',3,...
'AdvantageEstimateMethod',"gae",...
'GAEFactor',0.95,...
'DiscountFactor',0.99,...
'MiniBatchSize',64);
agentOptions.NoiseOptions.Variance = [0.6;0.1];
agentOptions.NoiseOptions.VarianceDecayRate = 1e-5;
%%
agent = rlPPOAgent(actor,critic,agentOptions);
%%
maxepisodes = 5000;
maxsteps = ceil(Ts/Tf); % Ts: simulation time, Tf: sampled time
trainingOpts = rlTrainingOptions(...
'MaxEpisodes',maxepisodes,...
'MaxStepsPerEpisode',maxsteps,...
'Verbose',true,...
'Plots','training-progress',...
'StopTrainingCriteria','AverageReward',...
'StopTrainingValue',300,...
'SaveAgentCriteria','EpisodeReward',...
'SaveAgentValue',100);
%'ScoreAveragingWindowLength',250,... number concetive episodes
% in workspace,save(opt.SaveAgentDirectory + "/finalAgent.mat",'agent')
%you can show in workspace for example;
%trainOpts =
% rlTrainingOptions with properties:
%
% MaxEpisodes: 1000
% MaxStepsPerEpisode: 1000
% ScoreAveragingWindowLength: 5
% StopTrainingCriteria: "AverageReward"
% StopTrainingValue: 480
% SaveAgentCriteria: "none"
% SaveAgentValue: "none"
% SaveAgentDirectory: "savedAgents"
%%
doTraining = false;
if doTraining
% Train the agent.
trainingStats = train(agent,env,trainingOpts);
else
% Load a pretrained agent for the example.
load('Trains/savedAgents_1/finalAgent','agent')
end
simOptions = rlSimulationOptions('MaxSteps',maxsteps);
experience = sim(env,agent,simOptions);
%%
function in = localResetFcn(in)
% reset
in = setVariable(in,'e1_initial', 0.5*(-1+2*rand)); % random value for lateral deviation
in = setVariable(in,'e2_initial', 0.1*(-1+2*rand)); % random value for relative yaw angle
end
0 Comments
Answers (1)
Emmanouil Tzorakoleftherakis
on 8 Jan 2021
Edited: Emmanouil Tzorakoleftherakis
on 8 Jan 2021
Hello,
Please take a look at how to create the actor and critic networks for continuous PPO here. It seems there is a dimension mismatch and following the doc example should help.
If you are using R2020b, there is a new feature that lets you create a PPO agent without creating the actor and critic neural networks - Reinforcement Learning Toolbox will create a default architecture for you that you can then modify as needed. Please take a look at this example to see how to implement this.
2 Comments
Emmanouil Tzorakoleftherakis
on 10 Jan 2021
You can change the learn rate using rlRepresentaitonOptions
See Also
Categories
Find more on Agents in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!