PPO Agent training - Is it possible to control the number of epochs dynamically?
Show older comments
In the deault implementation of PPO agent in Matlab, the number of epochs is a static property that must be selected before the training starts.
However I've seen that state-of-the-art implentations of PPO sometimes select dynamically the number of epochs: basically, for each learning phase, the algorithm decides whether to execute a new epoch or not, basing on the value of the KL divergence just calculated. This seems to help the robustness of the algorithm significanlty.
Is it possible for a user to implement such a routine in Matlab in the context of PPO training, possibly applying some slight modifications to the default process?
Accepted Answer
More Answers (0)
Categories
Find more on Reinforcement Learning in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!