Is it possible to use the reinforcement learning toolbox in a Simulink/Adams co-simulation?

Question

chengye he on 18 Dec 2020

0
Link

Direct link to this question

https://uk.mathworks.com/matlabcentral/answers/697520-is-it-possible-to-use-the-reinforcement-learning-toolbox-in-a-simulink-adams-co-simulation

Commented: xw x on 3 Nov 2022

Accepted Answer: Emmanouil Tzorakoleftherakis

I gave my Adams model a step input in Simulink co-simulation. The co-simulation turned out to be just fine, with the animation being just as expected. After I tried to implement reinforcement learning on my model following this example https://www.mathworks.com/help/reinforcement-learning/ug/quadruped-robot-locomotion-using-ddpg-gent.html. I got this error in the picture below

So is it because there is something wrong with my coding or is it not possible to implement the reinforcement learning toolbox in co-simulation? Thank you!

2 Comments
Show NoneHide None

xw x on 3 Nov 2022

Hello, I also have the same doubt, may I ask if you have solved your problem? Although the delay module can make the training continue, the reward given by the environment is delayed and then input to the agent, is it correct? In addition, my training result is very unsatisfactory, is it the problem that appears here

xw x on 3 Nov 2022

Hello, I also have the same doubt, may I ask if you have solved your problem? Although the delay module can make the training continue, the reward given by the environment is delayed and then input to the agent, is it correct? In addition, my training result is very unsatisfactory, is it the problem that appears here

Sign in to comment.

Sign in to answer this question.

Answer 1

Emmanouil Tzorakoleftherakis on 18 Dec 2020

2
Link

Direct link to this answer

https://uk.mathworks.com/matlabcentral/answers/697520-is-it-possible-to-use-the-reinforcement-learning-toolbox-in-a-simulink-adams-co-simulation#answer_579525

Hello,

You should be able to use Reinforcement Learning Toolbox for cosimulation. It looks like closing the loops with observations and rewards create algebraic loops somewhere. Since the ADAMS plant is within an S-function, I would check the connections between that s-function and the RL Agent (so observations, actions, reward). You should be able to get rid of the error by adding a delay block.

Please take a look at the following links that go over algebraic loops and how to remove them:

https://www.mathworks.com/help/simulink/ug/algebraic-loops.html

https://www.mathworks.com/help/simulink/ug/remove-algebraic-loops.html

https://www.mathworks.com/matlabcentral/answers/95310-what-are-algebraic-loops-in-simulink-and-how-do-i-solve-them

3 Comments
Show 1 older commentHide 1 older comment

chengye he on 20 Dec 2020

Hello Mr. Tzorakoleftherakis,

Thank you, after adding a unit delay block between action and the plant the agent starts to learn just fine. However, I am not sure if this is the solution. In this example, unit delay block is introduced in order to feedback the torque from the previous time step. So, doesn’t adding a delay between plant and agent (so observations, actions, reward as you mentioned) make the whole learning base on delayed signals? In my case, it seemed that the plant will be fed with signals from the previous time step. I wonder if this affects the system making it become unintended, or does the agent eventually learns that there is a delay in the system so I don’t need to worry about it? I tried to solve this algebraic loop by the suggestions like: introduce IC for plant output and Minimize algebraic loop occurrence, one failed to solve the problem and the other couldn't be found in subsystem properties. So, am I safe to move on to play with the reward function or should I still work on the system? Here is what my system looks like now with (1/z) between action and the plant. Big thanks for your time.

Emmanouil Tzorakoleftherakis on 21 Dec 2020

Edited: Emmanouil Tzorakoleftherakis on 21 Dec 2020

Hello,

I would say that the delay block should go at the reward signal, right before it enters the RL Agent block (possibly also in the other observations as well as the IsDone). If you delay the actions, it will likely mess up training.

If delaying the above does not work, another thing is to consider fixing the algebraic loop from within the adams S-function. Check the direct feedthrough section here. This link should also be helpful.

Sign in to comment.

Is it possible to use the reinforcement learning toolbox in a Simulink/Adams co-simulation?

2 Comments
Show NoneHide None

Accepted Answer

3 Comments
Show 1 older commentHide 1 older comment

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Is it possible to use the reinforcement learning toolbox in a Simulink/Adams co-simulation?

2 Comments Show NoneHide None

Accepted Answer

3 Comments Show 1 older commentHide 1 older comment

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

2 Comments
Show NoneHide None

3 Comments
Show 1 older commentHide 1 older comment