Cumulative reward in RLagent block is 100 times bigger than it should

Question

Enrico Fioresi on 29 Aug 2023

0
Link

Direct link to this question

https://uk.mathworks.com/matlabcentral/answers/2014496-cumulative-reward-in-rlagent-block-is-100-times-bigger-than-it-should

Answered: Yatharth on 4 Sep 2023

Accepted Answer: Yatharth

I am working on a reinforcement learning project with RL agent in simulink. I am using a DDPG agent.

It occured to me, that the cumulative reward of every episode (displayed in the verbose, in the performance plot and as output of the RLagent block) is exactly 100 times bigger than the cumulative reward i can produce by integrating the reward signal.
Furthermore, the Q0 value is converging (in my specific problem) to a value that is 2e-2 times the cumulative reward.

I did not find anything about this in the documnetation.

I wonder if this is normal, if it affects the training, or if depends upon the agent options.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Yatharth on 4 Sep 2023

1
Link

Direct link to this answer

https://uk.mathworks.com/matlabcentral/answers/2014496-cumulative-reward-in-rlagent-block-is-100-times-bigger-than-it-should#answer_1300791

Hi Enrico,

I understand that you observed a change in the magnitude of the cumulitave reward by a factor of 100 when displayed in verbose.

The difference in the scale of the cumulative reward displayed in the verbose, performance plot, and RL agent block compared to the cumulative reward obtained by integrating the reward signal is likely due to the scaling factor applied by the DDPG agent.

DDPG agents often use a scaling factor to normalize the rewards during training. This scaling factor is applied to ensure stable learning and to prevent the agent from being overly sensitive to the magnitude of the rewards. As a result, the displayed cumulative reward may be scaled up or down compared to the raw rewards.

Regarding the convergence of the Q0 value, it is expected that the Q0 value will be different from the cumulative reward. The Q0 value represents the expected future reward from the starting state, while the cumulative reward is the sum of the rewards obtained during an episode. These two values may have different scales and interpretations.

In general, the scaling of the cumulative reward and the convergence of the Q0 value should not significantly affect the training process as long as the agent is able to learn and improve its policy based on the rewards and the Q-values. It is important to focus on the relative improvement of the agent's performance over time rather than the absolute values of the rewards or Q-values.

I am attaching few links to provide insights into the implementation details of DDPG and related algorithms, including the use of reward scaling to stabilize training. While they may not specifically mention the scaling factor you observed, they discuss the general concept of scaling rewards in RL training to ensure stability and prevent sensitivity to reward magnitudes.

Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., ... & Wierstra, D. (2016). Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971. Available at: https://arxiv.org/abs/1509.02971
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347. Available at: https://arxiv.org/abs/1707.06347
OpenAI Spinning Up in Deep RL: Reward Scaling. Available at: https://spinningup.openai.com/en/latest/algorithms/ddpg.html#reward-scaling

I hope this helps.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Cumulative reward in RLagent block is 100 times bigger than it should

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

0 Comments
Show -2 older commentsHide -2 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Cumulative reward in RLagent block is 100 times bigger than it should

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

0 Comments Show -2 older commentsHide -2 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments