why AC-agent converged to minimal ?
Show older comments
Hello everyone!
I trained an AC-Agent. But agent converged to policy that gives minimal reward. I'm not sure if the problem is with the neural network or the environment. Rewards are negative because i want to find minimal of volumen. I have changed two parameter, lernrate = 0.05, entropylossweight=0.01. other parameter are default. I do not know what parameter should be of particular interest.

I changed lernrate to lower value of 0.0005, then cant converge.

Here ist actor and critic:
I want actor give value between [0 1]
%% neural network
nnc = [
featureInputLayer(prod(obsInfo.Dimension), 'Name', 'input_c')
fullyConnectedLayer(Knoten, 'Name', 'fc_c1')
reluLayer('Name', 'relu1')
fullyConnectedLayer(Knoten, 'Name', 'fc_c2')
reluLayer('Name', 'relu2')
fullyConnectedLayer(1, 'Name', 'output')];
nnc = dlnetwork(nnc);critic = rlValueFunction(nnc,obsInfo);
% getValue(critic,{rand(obsInfo.Dimension)})
input_actor = [
featureInputLayer( ...
prod(obsInfo.Dimension), ...
Name="input_a")
fullyConnectedLayer( ...
prod(actInfo.Dimension), ...
Name="in_fc")
];
nna1 = [
tanhLayer(Name="tanhMean");
fullyConnectedLayer(prod(actInfo.Dimension),"Name", 'fc_mean');
sigmoidLayer(Name="output_mean")
];
nna2 = [
tanhLayer(Name="tanhStdv");
fullyConnectedLayer(prod(actInfo.Dimension),"Name", 'fc_div');
softplusLayer(Name="output_div")
];
nna = layerGraph(input_actor);
nna = addLayers(nna,nna1);
nna = addLayers(nna,nna2);
nna = connectLayers(nna,"in_fc","tanhMean/in");
nna = connectLayers(nna,"in_fc","tanhStdv/in");
% plot(nna)
nna = dlnetwork(nna);
% summary(net)
actor = rlContinuousGaussianActor(nna, obsInfo, actInfo, ...
ActionMeanOutputNames="output_mean",...
ActionStandardDeviationOutputNames="output_div",...
ObservationInputNames="input_a");
the step function in environment
function [nextobs,reward,isdone,loggedSignals] = step(this,action)
% unpack actions
this.Robot.x = action(1);
this.Robot.y = action(2);
this.Robot.NumOfHeight = action(3);
this.Robot.NumOfAngle = action(4);
[~, ~, ~, this.Volumennew, ~]=PrunnedTreeGenerator(this.Robot.x, this.Robot.y, this.Robot.NumOfHeight, this.Robot.NumOfAngle, 3,...
0.8, this.H, this.Bin_In_Training, this.RB, 0.5, 1);
% Assign new state when small volume are found
if this.Volumennew<=min(this.volume_tree_Collection)
this.volume_tree = this.Volumennew;
%disp(this.volume_tree)
this.volume_tree_Collection = [this.volume_tree_Collection;...
this.volume_tree];
end
reward = -this.volume_tree/(0.5^2*pi*sum(this.Bin_In_Training(:, 3)));
% isdone function: step stops, when found distance between
% point that bigger than minimal, a negative reward are given
%isdone = this.Volumennew>=min(this.volume_tree_Collection);
Distance = distanceCalculator(this, this.Robot.x, this.Robot.y);
Mean = meanXYZ(this);
Sigma_Square=getSigma(this);
DivisionSize=getSizeDivision(this);
isdone = this.l>=24;
if ~isdone
this.l=this.l+1;
nextobs = [this.Robot.x, this.Robot.y, this.Robot.NumOfHeight, this.Robot.NumOfAngle, this.volume_tree, size(this.Bin_In_Training, 1), Distance, Mean, Sigma_Square, DivisionSize]';
%reward = sum(1+this.l)/this.l;
this.State = [this.Robot.x, this.Robot.y, this.Robot.NumOfHeight, this.Robot.NumOfAngle, this.volume_tree, size(this.Bin_In_Training, 1), Distance, Mean, Sigma_Square, DivisionSize]';
% if isdone is false, that means a minimal has been found,
% therefor a positive reward has been given
else
this.l=this.l+1;
%disp(this.State)
%disp(this.volume_tree)
nextobs = [this.Robot.x, this.Robot.y, this.Robot.NumOfHeight, this.Robot.NumOfAngle, this.volume_tree, size(this.Bin_In_Training, 1), Distance, Mean, Sigma_Square, DivisionSize]';
this.StepState = [this.StepState;this.k this.l nextobs'];
this.k=this.k+1;
%reward = ;
end
this.State = nextobs;
this.IsDone = isdone;
loggedSignals = nextobs;
end
Hope for help!
thanks!
Kun
Accepted Answer
More Answers (0)
Categories
Find more on Environments in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!