RL: Continuous action space, but within a desired range

16 views (last 30 days)
I am now trying to use a PPO in RL training with continuous action space.
However, I want to have my actor's output always within certain range (e.g. only between 0 to 1). I once tried to map / suppress any out of bound actions to the range, but the performance seemed not good. Are there other ways to tackle this situation? Thank you.

Answers (1)

Emmanouil Tzorakoleftherakis
Hello,
There are two ways to enforce this:
1) Using the upper and lower limits in rlNumericSpec when you are creating the action space
2) Adding tanh layer followed by a scaling layer in the "mean" path of your actor network as show in this example. This way, you can scale the mean value to the desired range
  4 Comments
Ammad Sadaqat
Ammad Sadaqat on 10 Nov 2021
Edited: Ammad Sadaqat on 10 Nov 2021
Hello,
I tried to use the:
rlNumericSpec([1 1],'LowerLimit',0,'UpperLimit',1)
But i am still facing the same issue that the values for the actions is not bounded between 0 and 1 although i also use the tanh layer followed by a scaling layer
another thing is that if i bound my actions in the simulink enviornment e:g all the values less than 0 should be 0 and all the values greater than 1 should be 1, except changing those values which is already between 0 and 1. What i think is, still it would be inefficicent in terms of simulation time. I would really appreaciate if one can suggest any possible solution for this.
Thanks in advance!
Francisco Serra
Francisco Serra on 13 Nov 2023
Edited: Francisco Serra on 13 Nov 2023
Hello. I am trying to control a dynamical system ̈p=u, driving p to 0.
For that I am using a rlPPOAgent. I want the actions to be bounded by -10 and 10 (-10 < u < 10).
If the actor samples from a gaussian distribution in which the mean and stdv values are given by the Neural Network, how can we ensure the boundedness? The rlNumericSpec is only a way to store the values, but does nothing in practical terms, right? I tried to apply a tanh activation function to the meanPath of my actor to squash the values to [-1, 1] and then apply a scaling layer to scale it to [-10, 10]:
meanPath = [
fullyConnectedLayer(16, 'Name', 'meanPathIn')
reluLayer('Name', 'relu5')
fullyConnectedLayer(numAct, 'Name', 'fc6')
tanhLayer(Name="tanhStd")
scalingLayer(Name='meanPathOut', ...
Scale=ainfo.UpperLimit)]; #--> this is where the rlNumericSpec defined above is used
the standard deviation only has a ReLu layer to enforce non-negativity.
In my way of seeing this,I am bounding the mean value to be in this interval, but the actual sampled action can get out of this bounds.
This is one episode of my training process in which we can see that the control input u isn't bounded!
Can somebody help?

Sign in to comment.

Products


Release

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!