Parallel reinforcement learning in separate runs leads to strange learning curve

1 view (last 30 days)

Mirjan Heubaum on 23 Dec 2021

1
Link

Direct link to this question

https://uk.mathworks.com/matlabcentral/answers/1616425-parallel-reinforcement-learning-in-separate-runs-leads-to-strange-learning-curve

Edited: Mirjan Heubaum on 23 Dec 2021

I'm running training of a DDPG reinforcement learning agent using a HPC cluster node and parallel computing toolbox for only 400 episodes due to some errors I experienced before when running it for much more episodes. Then I save the agent including the experience buffer and repeat the training in a loop. I start the training with

agent.AgentOptions.ResetExperienceBufferBeforeTraining = false;
agent.AgentOptions.SaveExperienceBufferWithAgent=true;
trainingStats = train(agent,env,trainOpts);

and save the agent with

agent.AgentOptions.SaveExperienceBufferWithAgent=true
save(filename, 'agent', '-v7.3');

I can see the experience buffer growing since

agent.ExperienceBuffer.Length

becomes larger. I use

load(PRE_TRAINED_MODEL_FILE,'agent');
agent.AgentOptions.NoiseOptions.Variance = [1200;400;2;1000].*exp(pastepisodes*log(1-agentOpts.NoiseOptions.VarianceDecayRate));

to get the noise variance decay I would expect when using only one training run. The learning rate for the critic is 5e-03 and for the actor 1e-03.

The result is a learning curve I wouldn't expect. I think the curve looks like either the noise variance is reset on each run, or the ExperienceBuffer from the last runs is not being used. The reward should reach approx. 1500.

Does anybody has an idea why the curve looks like this? Do you have an advice on how to adjust the hyperparameters?

0 Comments
Show -2 older commentsHide -2 older comments

Answers (0)

Products

Release

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Parallel reinforcement learning in separate runs leads to strange learning curve

0 Comments
Show -2 older commentsHide -2 older comments

Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Parallel reinforcement learning in separate runs leads to strange learning curve

0 Comments Show -2 older commentsHide -2 older comments

Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments