Why is my DDPG agent converging to a state where it gets continuous penalization, while having a state it can go with 0 penalization?

Question

Francisco Serra on 19 Feb 2024

0
Link

Direct link to this question

https://uk.mathworks.com/matlabcentral/answers/2083918-why-is-my-ddpg-agent-converging-to-a-state-where-it-gets-continuous-penalization-while-having-a-sta

Answered: Emmanouil Tzorakoleftherakis on 20 Feb 2024

I am training a Reinforcement Learning DDPG agent to drive a vehicle to a reference.

The vehicle dynamics are:

x_dot = v*cos(psi);
y_dot = v*sin(psi);
psi_dot = w;
v_dot = a;

Having as observations - obs=[e_x, e_y, e_psi, e_v] - and actions - u=[w (psi_dot); a (v_dot)]- , my DDPG agent is failing to get to the reference with 0 error.

Reward ate ach step --> rwd = -(x^T*Q*x + u^T*R*u) (I used the same as the LQR cost funtion to have a comparison)

No matter how I tune the hyper-parameters or make my actor and critic networks more or less complex, the gap is always there.

For a reason that I don't know, I remembered to remove all biases from the neurons of my networks, buiding actor and critic networks that just have weights, and that actually solved the problem. All the trainings, with different hyperparameters, drived the error to 0:

What I wanted to ask is:

1 - Why does removing the biases solve the problem?

2 - Despite driving the errors to 0, removing the bias terms resulted in a degradated performance comparing to an agent with bias terms that I luckily got by stoping the training at a moment where the weights happened to drive the error to 0 (if I had let the training go for 10 more episodes the gap would appear again, that's why I can't use that agent, there is no consistency). How can I get the agent with bias terms drive the error to 0?

I would really appreciate if anyone could answear me, because I can't seem to find an explanation for this.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Emmanouil Tzorakoleftherakis on 20 Feb 2024

0
Link

Direct link to this answer

https://uk.mathworks.com/matlabcentral/answers/2083918-why-is-my-ddpg-agent-converging-to-a-state-where-it-gets-continuous-penalization-while-having-a-sta#answer_1412963

My guess is that this happens due to the specifics of the problem. You want to build a controller that generates 'zeroes' when the error inputs are zero. Removing the biases happens to make this much easier assuming your actor is a feedforward net (think of Y=W*X+B - if X is close to zero, Y will be close to zero even if W is not perfectly optimized. However B will completely shift the signal).

By the way, your reference here is constant - it would be much harder to achieve the same with a time-varying reference. In general it is much harder to consistently achieve zero tracking error with RL compared to a more traditional controller, because you would need to do a lot of training on 'low-error' inputs.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Why is my DDPG agent converging to a state where it gets continuous penalization, while having a state it can go with 0 penalization?

0 Comments
Show -2 older commentsHide -2 older comments

Answers (1)

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Why is my DDPG agent converging to a state where it gets continuous penalization, while having a state it can go with 0 penalization?

0 Comments Show -2 older commentsHide -2 older comments

Answers (1)

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments