PPO Agent training - Is it possible to control the number of epochs dynamically?
    4 views (last 30 days)
  
       Show older comments
    
    Federico Toso
 on 17 Mar 2024
  
    
    
    
    
    Commented: Federico Toso
 on 25 Mar 2024
            In the deault implementation of PPO agent in Matlab, the number of epochs is a static property that must be selected before the training starts.
However I've seen that state-of-the-art implentations of PPO sometimes select dynamically the number of epochs: basically, for each learning phase, the algorithm decides whether to execute a new epoch or not, basing on the value of the KL divergence just calculated. This seems to help the robustness of the algorithm significanlty.
Is it possible for a user to implement such a routine in Matlab in the context of PPO training, possibly applying some slight modifications to the default process?
0 Comments
Accepted Answer
  Kartik Saxena
      
 on 22 Mar 2024
        Hi,
Given below is the code snippet depicting the logic/pseudo algorithm you can refer to for this purpose:
% Assume env is your environment and agent is your PPO agent
for episode = 1:maxEpisodes
    experiences = collectExperiences(env, agent);
    klDivergence = inf;
    epochCount = 0;
    while klDivergence > klThreshold && epochCount < maxEpochs
        oldPolicy = getPolicy(agent);
        agent = updateAgent(agent, experiences);
        newPolicy = getPolicy(agent);
        klDivergence = calculateKLDivergence(oldPolicy, newPolicy);
        epochCount = epochCount + 1;
    end
end
Additionally, you can refer to the following documentations and examples to get an idea and use it for your custom implementation of PPO agent:
I hope it helps!
More Answers (0)
See Also
Categories
				Find more on Deep Learning Toolbox in Help Center and File Exchange
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
