ddpg agent does not learn
Show older comments
hi im using a ddpg alghorithm to learn for tuning a pd like controller (transpose jacobian) for tuning its gains.my gains need to be beetween 0.01 and 0.00001 and based on this range i tune my variance : variance*sqrt(sample time) = 10% of range
but my agent does not learn and just see peaks some times but after that it falls to minimum again. i dont know why is this happening.

and the construct of my architectures is:
statepath = [featureInputLayer(numObs , Name = 'stateinp')
fullyConnectedLayer(96,Name = 'stateFC1')
reluLayer
fullyConnectedLayer(74,Name = 'stateFC2')
reluLayer
fullyConnectedLayer(36,Name = 'stateFC3')]
actionpath = [featureInputLayer(numAct, Name = 'actinp')
fullyConnectedLayer(72,Name = 'actFC1')
reluLayer
fullyConnectedLayer(36,Name = 'actFC2')]
commonpath = [additionLayer(2,Name = 'add')
fullyConnectedLayer(96,Name = 'FC1')
reluLayer
fullyConnectedLayer(72,Name = 'FC2')
reluLayer
fullyConnectedLayer(24,Name = 'FC3')
reluLayer
fullyConnectedLayer(1,Name = 'output')]
critic_network = layerGraph()
critic_network = addLayers(critic_network,actionpath)
critic_network = addLayers(critic_network,statepath)
critic_network = addLayers(critic_network,commonpath)
critic_network = connectLayers(critic_network,'actFC2','add/in1')
critic_network = connectLayers(critic_network,'stateFC3','add/in2')
plot(critic_network)
critic = dlnetwork(critic_network)
criticOptions = rlOptimizerOptions('LearnRate',1e-03,'GradientThreshold',1);
critic = rlQValueFunction(critic,obsInfo,actInfo,...
'ObservationInputNames','stateinp','ActionInputNames','actinp');
%% actor
actorNetwork = [featureInputLayer(numObs,Name = 'observation')
fullyConnectedLayer(72,Name = 'actorFC1')
reluLayer
fullyConnectedLayer(48,Name='actorFc2')
reluLayer
fullyConnectedLayer(36,Name='actorFc3')
reluLayer
fullyConnectedLayer(numAct,Name='output')
tanhLayer
scalingLayer(Name = 'actorscaling',scale = max(actInfo.UpperLimit))]
actorNetwork = dlnetwork(actorNetwork);
actorOptions = rlOptimizerOptions('LearnRate',5e-04,'GradientThreshold',1);
actor = rlContinuousDeterministicActor(actorNetwork,obsInfo,actInfo);
%% agent
agentOptions = rlDDPGAgentOptions(...
'SampleTime',0.001,...
'ActorOptimizerOptions',actorOptions,...
'CriticOptimizerOptions',criticOptions,...
'ExperienceBufferLength',1e6,...
'MiniBatchSize',128);
agentOptions.NoiseOptions.StandardDeviation = 0.03;
agentOptions.NoiseOptions.StandardDeviationDecayRate = 1e-5;
agent_MTJ_rl_mobilemanipualtor9 = rlDDPGAgent(actor,critic,agentOption
Answers (2)
Mrutyunjaya Hiremath
on 23 Jul 2023
Check this
% Define the observation and action space
numObs = 4; % Replace with the actual number of observation features
numAct = 2; % Replace with the actual number of action dimensions
% Create the actor network
actorNetwork = [
featureInputLayer(numObs, 'Name', 'observation')
fullyConnectedLayer(72, 'Name', 'actorFC1')
reluLayer
fullyConnectedLayer(48, 'Name', 'actorFC2')
reluLayer
fullyConnectedLayer(36, 'Name', 'actorFC3')
reluLayer
fullyConnectedLayer(numAct, 'Name', 'output')
tanhLayer
scalingLayer('Name', 'actorscaling', 'Scale', max(actInfo.UpperLimit))
];
actorNetwork = dlnetwork(actorNetwork);
% Create the critic network
statePath = [
featureInputLayer(numObs, 'Name', 'stateinp')
fullyConnectedLayer(96, 'Name', 'stateFC1')
reluLayer
fullyConnectedLayer(74, 'Name', 'stateFC2')
reluLayer
fullyConnectedLayer(36, 'Name', 'stateFC3')
];
actionPath = [
featureInputLayer(numAct, 'Name', 'actinp')
fullyConnectedLayer(72, 'Name', 'actFC1')
reluLayer
fullyConnectedLayer(36, 'Name', 'actFC2')
];
commonPath = [
additionLayer(2, 'Name', 'add')
fullyConnectedLayer(96, 'Name', 'FC1')
reluLayer
fullyConnectedLayer(72, 'Name', 'FC2')
reluLayer
fullyConnectedLayer(24, 'Name', 'FC3')
reluLayer
fullyConnectedLayer(1, 'Name', 'output')
];
criticNetwork = layerGraph();
criticNetwork = addLayers(criticNetwork, statePath);
criticNetwork = addLayers(criticNetwork, actionPath);
criticNetwork = addLayers(criticNetwork, commonPath);
criticNetwork = connectLayers(criticNetwork, 'actFC2', 'add/in1');
criticNetwork = connectLayers(criticNetwork, 'stateFC3', 'add/in2');
critic = dlnetwork(criticNetwork);
% Create the actor and critic options
actorOptions = rlRepresentationOptions('Optimizer', rlADAMOptimizer('LearnRate', 5e-4));
criticOptions = rlRepresentationOptions('Optimizer', rlADAMOptimizer('LearnRate', 1e-3));
% Create the actor and critic representations
actor = rlDeterministicActorRepresentation(actorNetwork, obsInfo, actInfo, 'Observation', 'observation', actorOptions);
critic = rlQValueRepresentation(criticNetwork, obsInfo, actInfo, 'Observation', 'stateinp', 'Action', 'actinp', criticOptions);
% Create the DDPG agent
agentOptions = rlDDPGAgentOptions(...
'SampleTime', 0.001,...
'Actor', actor,...
'Critic', critic,...
'ExperienceBufferLength', 1e6,...
'MiniBatchSize', 128);
agentOptions.NoiseOptions.StandardDeviation = 0.03;
agentOptions.NoiseOptions.StandardDeviationDecayRate = 1e-5;
agent = rlDDPGAgent(obsInfo, actInfo, agentOptions);
% Train the agent
trainOpts = rlTrainingOptions(...
'MaxEpisodes', 1000,...
'MaxStepsPerEpisode', 1000,...
'ScoreAveragingWindowLength', 5,...
'Plots', 'training-progress');
trainingStats = train(agent, env, trainOpts);
awcii
on 24 Jul 2023
0 votes
.
1 Comment
Harold
on 31 Mar 2025
@awciihill climb racing Bonjour, de quoi souhaitez-vous discuter ? ou je me demande encore où. Veuillez être clair sur les problèmes que vous rencontrez. Si cela correspond à ma compréhension, je suis prêt à vous aider.
Categories
Find more on Reinforcement Learning in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!