Episode q0 estimate does not converge but I get good results in simulation

Question

Sebastián Quiroga Reyes on 7 Aug 2021

0
Link

Direct link to this question

https://uk.mathworks.com/matlabcentral/answers/894107-episode-q0-estimate-does-not-converge-but-i-get-good-results-in-simulation

Answered: Ayush Modi on 17 Jan 2024

Hi, I'm using Reinforcement Learning on a control problem, specifically a TD3 agent. I have an order 3 plant and I want to use RL to find the optimal values for PI gains, so i'm basing on this matlab link.

My problem is very similar to the matlab example, but instead of water tank I have to control the input airflow to generate a temperature signal that follows this reference:

Summarizing:

action: airflow speed (%)
Observation: error (reference temperature vs real temperature) and integral error
Reward function:

+10 if the error < 0.1, -1 otherwise
-1000 if temp < 0 (episode stop condition)
-10 if action < 50 (this helps to avoid bad states)

I have 3 neural networks:

Actor: a single neuron. The weights are the PI gains.
Critics: TD3 algorithm uses 2 critics, they have the same architecture

This is one of the best agent I've trained so far:

Learning rate: actor: 0.001, critic: 0.01

I trained this agent 1000 episodes more, but I get worse simulation results.

Looking at the action signal generated by this RL agent, it's fairly good, in control terms I think.

The problem here, it's Episode q0. According to matlab:

For agents with a critic, Episode Q0 is the estimate of the discounted long-term reward at the start of each episode, given the initial observation of the environment. As training progresses, if the critic is well designed. Episode Q0 approaches the true discounted long-term reward, as shown in the preceding figure.

Episode q0 (yellow line) doesn't approach the episode reward (blue line) and average reward (red line). So, according to this, my agent is very bad, right? but why I'm getting good results? And also, how can I fix this? Just trying another critic architecture like more layers?

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Ayush Modi on 17 Jan 2024

1
Link

Direct link to this answer

https://uk.mathworks.com/matlabcentral/answers/894107-episode-q0-estimate-does-not-converge-but-i-get-good-results-in-simulation#answer_1391461

Open in MATLAB Online

Hi,

I found following answer in the community regarding Episode Q0. It is not necessary for Episode Q0 to be an indication of the learning quality of the RL agent for actor-critic methods. If you are getting good results, you need not make any changes.

https://www.mathworks.com/matlabcentral/answers/854195-what-exactly-is-episode-q0-what-information-is-it-giving

"In general, it is not required for this to happen for actor-critic mathods. The actor may converge first and at that point it would be totally fine to stop training."

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Episode q0 estimate does not converge but I get good results in simulation

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

0 Comments
Show -2 older commentsHide -2 older comments

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

Episode q0 estimate does not converge but I get good results in simulation

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

0 Comments Show -2 older commentsHide -2 older comments

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments