Training DDPG agent with custom training loop

12 views (last 30 days)
平成
平成 on 31 May 2025
Answered: Hitesh on 3 Jun 2025
Currently, I am designing a control system using deep reinforcement learning (DDPG) in reinforcement learning toolbox, MATLAB/Simulink. Specifically, I need to implement a custom training loop that does not rely on train functon. Could you please show me how to implement a custom training loop for training a DDPG agent? I would like to understand how to implement a standard DDPG-based control system using a custom training loop in MATLAB.
I will now provide the MATLAB code I currently use train function for a DDPG agent. Could you convert it into a version that uses a custom training loop (without using train)?
obsInfo = rlNumericSpec([6 1]);
obsInfo.Name = "observations";
actInfo = rlNumericSpec([1 1]);
actInfo.Name = "control input";
mdl ='SIM_RL'; % Simulink model by Plant + RL agent block
env = rlSimulinkEnv( ...
"SIM_RL", ...
"SIM_RL/Agent/RL Agent", ...
obsInfo, actInfo);
% Domain randomization: Reset function
env.ResetFcn = @(in)localResetFcn(in);
function in = localResetFcn(in)
% Fixed range of plant parameter
M_min = Nominal_value*(1 - 0.5); % -50% of nominal mass
M_max = Nominal_value*(1 + 0.5); % +50% of nominal mass
% Randomize mass
randomValue_M = M_min + (M_max - M_min) * rand;
in = setBlockParameter(in, ...
"SIM_RL/Plant/Mass", ...
Value=num2str(randomValue_M));
end
% The construction of the critic Network structure is omitted here.
% ....
criticNet = initialize(criticNet);
critic = rlQValueFunction(criticNet,obsInfo,actInfo);
% The construction of the actor Network structure is omitted here.
% ....
actorNet = initialize(actorNet);
actor = rlContinuousDeterministicActor(actorNet,obsInfo,actInfo);
% Set-up agent
criticOpts = rlOptimizerOptions(LearnRate=1e-04,GradientThreshold=1);
actorOpts = rlOptimizerOptions(LearnRate=1e-04,GradientThreshold=1);
agentOpts = rlDDPGAgentOptions(...
SampleTime=0.01,...
CriticOptimizerOptions=criticOpts,...
ActorOptimizerOptions=actorOpts,...
ExperienceBufferLength=1e5,...
DiscountFactor=0.99,...
MiniBatchSize=128,...
TargetSmoothFactor=1e-3);
agent = rlDDPGAgent(actor,critic,agentOpts);
maxepisodes = 5000;
maxsteps = ceil(Simulation_End_Time/0.01);
trainOpts = rlTrainingOptions(...
MaxEpisodes=maxepisodes,...
MaxStepsPerEpisode=maxsteps,...
ScoreAveragingWindowLength=5,...
Verbose=true,...
Plots="training-progress",...
StopTrainingCriteria="EpisodeCount",...
SaveAgentCriteria="EpisodeReward",...
SaveAgentValue=-1.0);
doTraining = true;
if doTraining
evaluator = rlEvaluator(...
NumEpisodes=1,...
EvaluationFrequency=5);
% Train the agent.
trainingStats = train(agent,env,trainOpts,Evaluator=evaluator);
else
% Load the pretrained agent
load("agent.mat","agent")
end

Answers (1)

Hitesh
Hitesh on 3 Jun 2025
Hi 平成,
To convert DDPG agent training setup from using the "train" function into a custom training loop in MATLAB. The custom loop gives you greater control over training, evaluation, logging, and integration with domain randomization.
Main Components of a Custom Training Loop are:
  • Environment Reset: Start each episode by resetting the environment.
  • Action Selection: Use the actor network to select an action based on the current observation.
  • Environment Step: Apply the action to the environment (e.g., via sim for Simulink models) and collect the next observation, reward, and done flag.
  • Experience Storage: Store the transition (state, action, reward, next state, done) in a replay buffer.
  • Learning: Sample mini-batches from the buffer and perform gradient updates on the actor and critic networks.
  • Target Updates: Soft update the target networks (actor and critic) toward the main networks.
  • Logging & Evaluation: Track performance (e.g., cumulative reward) and optionally evaluate the agent periodically.
Kindly refer to the following custom training loop as an example.
% Create agent
agent = rlDDPGAgent(actor, critic, agentOpts);
% Experience buffer
buffer = agent.ExperienceBuffer;
% Logging
episodeRewards = zeros(maxEpisodes,1);
% Custom Training Loop
for episode = 1:maxEpisodes
% Reset environment and agent
initialObs = reset(env);
agent.reset();
% Track episode reward
totalReward = 0;
for step = 1:maxStepsPerEpisode
% Get action from agent
action = getAction(agent, initialObs);
% Step the environment
[nextObs, reward, isDone, ~] = step(env, action);
% Store experience
experience = rlExperience(initialObs, action, reward, nextObs, isDone);
append(buffer, experience);
% Learn from experience if enough samples available
if buffer.NumExperiences >= agentOpts.MiniBatchSize
learn(agent, buffer);
end
% Update state and reward
initialObs = nextObs;
totalReward = totalReward + reward;
if isDone
break;
end
end
% Log reward
episodeRewards(episode) = totalReward;
fprintf("Episode %d: Total Reward = %.2f\n", episode, totalReward);
% Optional: save best agent
if mod(episode, 50) == 0
save(sprintf('agent_episode_%d.mat', episode), 'agent');
end
end
For more information regarding "DDPG Training Algorithm", kindly refer to the following MATLAB documentation:

Products


Release

R2024b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!