reinforcement learning agent simulation is not same with training agent

2 views (last 30 days)
Hi, I trained reinforcement learning agent with simulink for humanoid robot and looking at the robot configuraiton output while learning, the result of 'Agent 7' was good, so I saved the agent and proceeded with the simulation like the code below.
agent = load('Agent7.mat');
simOptions = rlSimulationOptions('MaxSteps', 50);
experience = sim(env,agent.saved_agent,simOptions);
However, it was different from the configuration output of Agent 7 in the learning process, so the action graph was observed using the scope. Graph during learning and simulation were different.
first picture is learning confituration output(Looking at the robot from left side) and action
second picture is simulation configuration output and action
During learning, the action showed discrete values, and this result was somewhat satisfactory. However, when the agent was saved and then simulated, the action value were only constants.
Could you please give me answer to solve this problem?
(I use matlab 2020b)

Answers (1)

Emmanouil Tzorakoleftherakis
Hello,
Please see this post that explains why simulation results may differ during training and after training.
One thing to consider as well in your case is how long you ran the training for. For example I see you are mentioning 'Agent7' - does this agent correspond to the policy after 7 episodes? If yes, it does not matter what the episode simulation showed, it is still too soon to converge to an acceptable policy. As I mention in the link above, what you see during training is a function of many things, including exploration of the agent.
Hope that helps

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!