Variable Sample Time in Reinforcement Learning

3 views (last 30 days)
Hey there,
i created a Simulink Model, which takes input values from an RL Agent. A Matlab function Block processes these values and outputs a set of actions, the model shall perform. This set has a variable length.
While this action set is performed, the reward function shall be calculated, but the Agent shall not output new values, as they are not taken into account, by the MATLAB function.
Maybe an example Makes it more clear:
  • The Agent outputs a value. Lets say 5
  • Based on the current state of the Model, an action set is created. Lets say the current state of the Model is 2 and the created action set would consist out of the steps 2,3,4 and 5. These set of actions are of variable length and are continous numbers between an lower and upper boundary. Could be something like 2.7, 4.01 aswell
  • While these actions are being performed, the Model does not react to any values, the Agent puts out. Reacting takes the Model between 0.1 and about 30 seconds.
  • The state of the Model while performing the set of actions must be evaluated by the reward function.
  • When the model has finished this set of actions, it is ready to take the next value from the agent.
Currently i have a sample rate of 0.1. This being the shortest amount of time, a set of actions can keep the Model busy.
If the first action set takes the Model 10 seconds to react to, there are 99 suggestions from the RL Agent, which the model does not react to. I fear that this might lead to pretty bad training.
I need the Agent to output a value, wait for feedback from the model an then output the next value. The reward function should have a higher resolution. Is something like this possible?
Thank you.
  1 Comment
Niklas Braun
Niklas Braun on 10 Dec 2020
curently i use a Sample and Hold Block to hold the observations, while the model is busy. The reward function still gets live values. But i am pretty sure, that there is a better way of doing it, as the actor still outputs its actions every 0.1s

Sign in to comment.

Accepted Answer

Emmanouil Tzorakoleftherakis
Hello,
The current format of training in Reinforcement Learning Toolbox assumes you are taking actions at fixed time intervals as you mentioned. We are looking into expanding this but we do not have an example we can share as of right now.
One thing you may want to look at is putting the RL Agent block under a triggered/enabled subsystem to force the agent to make decisions at varying intervals. I am not sure this will work but this is worth a shot.
  2 Comments
Aniruddha Datta
Aniruddha Datta on 30 Jun 2021
I just tried this , i am trying to force the agent to take action with only a triggered instance. It does not take -1 or inherited sampling time and the triggered sub system doesnt take any other sample time

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!