<Deep reinforcement learning____PPO>How do you fix this error?Ask for help

Question

祥 on 19 Mar 2024

0
Link

Direct link to this question

https://uk.mathworks.com/matlabcentral/answers/2096191-deep-reinforcement-learning____ppo-how-do-you-fix-this-error-ask-for-help

Answered: Ronit on 27 Mar 2024

Open in MATLAB Online

Hi everyone,

I am using PPO algorithm to train the agent in the custom environment, but there is an error.

I think it may be related to obsInfo, but I don't know how to solve this error? Below is my code and error log.

Please help the weak and helpless me, very grateful.

slx = 'RLcontrolstrategy0312';   
open_system(slx);
agentblk = slx +"/agent";
%obsinfo actinfo
%Is that the problem?
obsInfo=rlNumericSpec([49,1], ...
    'LowerLimit',0, ...
    'UpperLimit',1);
actInfo = rlNumericSpec([6,1], 'LowerLimit',[0 0 0 -1 -1 -1]','UpperLimit',[1 1 1 1 1 1]'); 
scale = [0.5 0.5 0.5 1 1 1]';
bias = [0.5 0.5 0.5 0 0 0]';
env = rlSimulinkEnv(slx,agentblk,obsInfo,actInfo);
Ts = 0.001;
Tf = 4;
rng(0)
%critic
cnet = [
    featureInputLayer(9,"Normalization","none","Name","observation1")
    fullyConnectedLayer(256,"Name","fc1")
    concatenationLayer(1,3,"Name","concat")
    tanhLayer("Name","tanh1")
    fullyConnectedLayer(256,"Name","fc2")
    tanhLayer("Name","tanh2")
    fullyConnectedLayer(128,"Name","fc3")
    tanhLayer("Name","tanh3")
    fullyConnectedLayer(64,"Name","fc4")
    tanhLayer("Name","tanh4")
    fullyConnectedLayer(32,"Name","fc5")
    tanhLayer("Name","tanh5")
    fullyConnectedLayer(1,"Name","CriticOutput")];
cnetMCT=[
    featureInputLayer(20,"Normalization","none","Name","observation2")
    fullyConnectedLayer(256,"Name","fc11")
    tanhLayer("Name","tanh13")
    fullyConnectedLayer(64,"Name","fc14")
    tanhLayer("Name","tanh14")
    fullyConnectedLayer(32,"Name","fc15")];
cnetMCR=[
    featureInputLayer(20,"Normalization","none","Name","observation3")
    fullyConnectedLayer(256,"Name","fc21")
    tanhLayer("Name","tanh23")
    fullyConnectedLayer(64,"Name","fc24")
    tanhLayer("Name","tanh24")
    fullyConnectedLayer(32,"Name","fc25")];
criticNetwork = layerGraph(cnet);
criticNetwork = addLayers(criticNetwork, cnetMCT);
criticNetwork = connectLayers(criticNetwork,"fc15","concat/in2");
criticNetwork = addLayers(criticNetwork, cnetMCR);
criticNetwork = connectLayers(criticNetwork,"fc25","concat/in3");
criticdlnet = dlnetwork(criticNetwork,'Initialize',false);
criticdlnet1 = initialize(criticdlnet);
%Is that the problem?
critic= rlValueFunction(criticdlnet1,obsInfo, ...
    ObservationInputNames=["observation1","observation2","observation3"]);
%actor
anet = [
    featureInputLayer(9,"Normalization","none","Name","ain1")
    fullyConnectedLayer(256,"Name","fc1")
    concatenationLayer(1,3,"Name","concat")
    tanhLayer("Name","tanh1")
    fullyConnectedLayer(256,"Name","fc2")
    tanhLayer("Name","tanh2")
    fullyConnectedLayer(128,"Name","fc3")
    tanhLayer("Name","tanh3")
    fullyConnectedLayer(64,"Name","fc4")
    tanhLayer("Name","tanh4")];
anetMCT=[
    featureInputLayer(20,"Normalization","none","Name","ain2")
    fullyConnectedLayer(256,"Name","fc11")
    tanhLayer("Name","tanh13")
    fullyConnectedLayer(64,"Name","fc14")
    tanhLayer("Name","tanh14")
    fullyConnectedLayer(32,"Name","fc15")];
anetMCR=[
    featureInputLayer(20,"Normalization","none","Name","ain3")
    fullyConnectedLayer(256,"Name","fc21")
    tanhLayer("Name","tanh23")
    fullyConnectedLayer(64,"Name","fc24")
    tanhLayer("Name","tanh24")
    fullyConnectedLayer(32,"Name","fc25")];
meanPath = [
    fullyConnectedLayer(32,"Name","meanFC")
    tanhLayer("Name","tanh5")
    fullyConnectedLayer(numAct,"Name","mean")
    tanhLayer("Name","tanh6")
    scalingLayer(Name="meanPathOut",Scale=scale,Bias=bias)];
stdPath = [
    fullyConnectedLayer(32,"Name","stdFC")
    tanhLayer("Name","tanh7")
    fullyConnectedLayer(numAct,"Name","fc5")
    softplusLayer("Name","std")];
actorNetwork = layerGraph(anet);
actorNetwork = addLayers(actorNetwork,anetMCT);
actorNetwork = addLayers(actorNetwork,anetMCR);
actorNetwork = connectLayers(actorNetwork,"fc15","concat/in2");
actorNetwork = connectLayers(actorNetwork,"fc25","concat/in3");
actorNetwork = addLayers(actorNetwork,meanPath);
actorNetwork = addLayers(actorNetwork,stdPath);
actorNetwork = connectLayers(actorNetwork,"tanh4","meanFC/in");
actorNetwork = connectLayers(actorNetwork,"tanh4","stdFC/in");
actordlnet = dlnetwork(actorNetwork);
%Is that the problem?
actor = rlContinuousGaussianActor(actordlnet,obsInfo,actInfo, ...
    "ActionMeanOutputNames","meanPathOut", ...
    "ActionStandardDeviationOutputNames","std", ...
    ObservationInputNames= ["ain1","ain2","ain3"]);
%agent
agentOptions=rlPPOAgentOptions("SampleTime",Ts,"DiscountFactor",0.995,"ExperienceHorizon",1024,"MiniBatchSize",512,"ClipFactor",0.2, ...
                               "EntropyLossWeight",0.01,"NumEpoch",8,"AdvantageEstimateMethod","gae","GAEFactor",0.98, ...
                               "NormalizedAdvantageMethod","current");
agent=rlPPOAgent(actor,critic,agentOptions);
%training
trainOptions=rlTrainingOptions("StopOnError","on", "MaxEpisodes",2000,"MaxStepsPerEpisode",floor(Tf/Ts), ...
                            "ScoreAveragingWindowLength",10,"StopTrainingCriteria","AverageReward", ...
                            "StopTrainingValue",100000,"SaveAgentCriteria","None", ...
                            "SaveAgentDirectory","D:\car\jianmo\zhangxiang\agent","Verbose",false, ...
                            "Plots","training-progress");
trainingStats = train(agent,env,trainOptions);

The debug logs are as follows

Incorrect use of rl.internal.validate.mapFunctionObservationInput

Number of input layers for deep neural network must equal to number of observation specifications.

error rlValueFunction(Line 92)

modelInputMap = rl.internal.validate.mapFunctionObservationInput(model,observationInfo,nameValueArgs.ObservationInputNames);

error ppo(Line 187)

critic= rlValueFunction(criticdlnet1,obsInfo, ...

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Ronit on 27 Mar 2024

0
Link

Direct link to this answer

https://uk.mathworks.com/matlabcentral/answers/2096191-deep-reinforcement-learning____ppo-how-do-you-fix-this-error-ask-for-help#answer_1431981

Open in MATLAB Online

Hi,

Based on the error log you've provided, the issue seems to be with the number of observation inputs expected by your neural network model and the number of observation specifications you've defined. This error is thrown by the ‘rlValueFunction’ when initializing the critic, indicating that the critic's network does not match the observation information ‘obsInfo’ you've specified.

You have defined ‘obsinfo’ as a single object and while initializing the critic with ‘rlValueFunction’, you have specified three observation input names:

critic= rlValueFunction(criticdlnet1,obsInfo, ...
    ObservationInputNames=["observation1","observation2","observation3"]);

This discrepancy between the number of ‘obsInfo’ objects (1) and the number of observation input names (3) is causing of the error.

To resolve this issue, ensure that the number of ‘obsInfo’ objects matches the number of observation input names you've specified for your network. If your environment produces three distinct observations, you should define an ‘obsInfo’ object for each and pass them as a vector to the ‘rlValueFunction’.