Is it possible to use the reinforcement learning toolbox in a Simulink/Adams co-simulation?

15 views (last 30 days)
I gave my Adams model a step input in Simulink co-simulation. The co-simulation turned out to be just fine, with the animation being just as expected. After I tried to implement reinforcement learning on my model following this example https://www.mathworks.com/help/reinforcement-learning/ug/quadruped-robot-locomotion-using-ddpg-gent.html. I got this error in the picture below
So is it because there is something wrong with my coding or is it not possible to implement the reinforcement learning toolbox in co-simulation? Thank you!
  2 Comments
xw x
xw x on 3 Nov 2022
Hello, I also have the same doubt, may I ask if you have solved your problem? Although the delay module can make the training continue, the reward given by the environment is delayed and then input to the agent, is it correct? In addition, my training result is very unsatisfactory, is it the problem that appears here
xw x
xw x on 3 Nov 2022
Hello, I also have the same doubt, may I ask if you have solved your problem? Although the delay module can make the training continue, the reward given by the environment is delayed and then input to the agent, is it correct? In addition, my training result is very unsatisfactory, is it the problem that appears here

Sign in to comment.

Accepted Answer

Emmanouil Tzorakoleftherakis
Hello,
You should be able to use Reinforcement Learning Toolbox for cosimulation. It looks like closing the loops with observations and rewards create algebraic loops somewhere. Since the ADAMS plant is within an S-function, I would check the connections between that s-function and the RL Agent (so observations, actions, reward). You should be able to get rid of the error by adding a delay block.
Please take a look at the following links that go over algebraic loops and how to remove them:
  3 Comments
chengye he
chengye he on 20 Dec 2020
Hello Mr. Tzorakoleftherakis,
Thank you, after adding a unit delay block between action and the plant the agent starts to learn just fine. However, I am not sure if this is the solution. In this example, unit delay block is introduced in order to feedback the torque from the previous time step. So, doesn’t adding a delay between plant and agent (so observations, actions, reward as you mentioned) make the whole learning base on delayed signals? In my case, it seemed that the plant will be fed with signals from the previous time step. I wonder if this affects the system making it become unintended, or does the agent eventually learns that there is a delay in the system so I don’t need to worry about it? I tried to solve this algebraic loop by the suggestions like: introduce IC for plant output and Minimize algebraic loop occurrence, one failed to solve the problem and the other couldn't be found in subsystem properties. So, am I safe to move on to play with the reward function or should I still work on the system? Here is what my system looks like now with (1/z) between action and the plant. Big thanks for your time.
Emmanouil Tzorakoleftherakis
Edited: Emmanouil Tzorakoleftherakis on 21 Dec 2020
Hello,
I would say that the delay block should go at the reward signal, right before it enters the RL Agent block (possibly also in the other observations as well as the IsDone). If you delay the actions, it will likely mess up training.
If delaying the above does not work, another thing is to consider fixing the algebraic loop from within the adams S-function. Check the direct feedthrough section here. This link should also be helpful.

Sign in to comment.

More Answers (0)

Products


Release

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!