Variable Sample Time in Reinforcement Learning

Question

0 votes

Hey there,

i created a Simulink Model, which takes input values from an RL Agent. A Matlab function Block processes these values and outputs a set of actions, the model shall perform. This set has a variable length.

While this action set is performed, the reward function shall be calculated, but the Agent shall not output new values, as they are not taken into account, by the MATLAB function.

Maybe an example Makes it more clear:

The Agent outputs a value. Lets say 5
Based on the current state of the Model, an action set is created. Lets say the current state of the Model is 2 and the created action set would consist out of the steps 2,3,4 and 5. These set of actions are of variable length and are continous numbers between an lower and upper boundary. Could be something like 2.7, 4.01 aswell
While these actions are being performed, the Model does not react to any values, the Agent puts out. Reacting takes the Model between 0.1 and about 30 seconds.
The state of the Model while performing the set of actions must be evaluated by the reward function.
When the model has finished this set of actions, it is ready to take the next value from the agent.

Currently i have a sample rate of 0.1. This being the shortest amount of time, a set of actions can keep the Model busy.

If the first action set takes the Model 10 seconds to react to, there are 99 suggestions from the RL Agent, which the model does not react to. I fear that this might lead to pretty bad training.

I need the Agent to output a value, wait for feedback from the model an then output the next value. The reward function should have a higher resolution. Is something like this possible?

Thank you.

1 Comment
Show -1 older comments Hide -1 older comments

Niklas Braun on 10 Dec 2020

curently i use a Sample and Hold Block to hold the observations, while the model is busy. The reward function still gets live values. But i am pretty sure, that there is a better way of doing it, as the actor still outputs its actions every 0.1s

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Emmanouil Tzorakoleftherakis on 11 Dec 2020

0 votes

Hello,

The current format of training in Reinforcement Learning Toolbox assumes you are taking actions at fixed time intervals as you mentioned. We are looking into expanding this but we do not have an example we can share as of right now.

One thing you may want to look at is putting the RL Agent block under a triggered/enabled subsystem to force the agent to make decisions at varying intervals. I am not sure this will work but this is worth a shot.

2 Comments
Show None Hide None

Aniruddha Datta on 30 Jun 2021

I just tried this , i am trying to force the agent to take action with only a triggered instance. It does not take -1 or inherited sampling time and the triggered sub system doesnt take any other sample time

Emmanouil Tzorakoleftherakis on 30 Jun 2021

Edited: Emmanouil Tzorakoleftherakis on 30 Jun 2021

can you try the enabled subsystem instead?

Sign in to comment.

Variable Sample Time in Reinforcement Learning

1 Comment
Show -1 older comments Hide -1 older comments

Accepted Answer

2 Comments
Show None Hide None

More Answers (0)

Categories

Products

Release

Tags

Community Treasure Hunt

Variable Sample Time in Reinforcement Learning

1 Comment Show -1 older comments Hide -1 older comments

Accepted Answer

2 Comments Show None Hide None

More Answers (0)

Categories

Products

Release

Tags

See Also

Community Treasure Hunt

1 Comment
Show -1 older comments Hide -1 older comments

2 Comments
Show None Hide None