RL: Continuous action space, but within a desired range

Question

Wing Yin Ng on 5 Jan 2021

0
Link

Direct link to this question

https://uk.mathworks.com/matlabcentral/answers/709388-rl-continuous-action-space-but-within-a-desired-range

Edited: Francisco Serra on 13 Nov 2023

I am now trying to use a PPO in RL training with continuous action space.

However, I want to have my actor's output always within certain range (e.g. only between 0 to 1). I once tried to map / suppress any out of bound actions to the range, but the performance seemed not good. Are there other ways to tackle this situation? Thank you.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Emmanouil Tzorakoleftherakis on 8 Jan 2021

0
Link

Direct link to this answer

https://uk.mathworks.com/matlabcentral/answers/709388-rl-continuous-action-space-but-within-a-desired-range#answer_593983

Hello,

There are two ways to enforce this:

1) Using the upper and lower limits in rlNumericSpec when you are creating the action space

2) Adding tanh layer followed by a scaling layer in the "mean" path of your actor network as show in this example. This way, you can scale the mean value to the desired range

4 Comments
Show 2 older commentsHide 2 older comments

Ammad Sadaqat on 10 Nov 2021

Edited: Ammad Sadaqat on 10 Nov 2021

Open in MATLAB Online

Hello,

I tried to use the:

rlNumericSpec([1 1],'LowerLimit',0,'UpperLimit',1)

But i am still facing the same issue that the values for the actions is not bounded between 0 and 1 although i also use the tanh layer followed by a scaling layer

another thing is that if i bound my actions in the simulink enviornment e:g all the values less than 0 should be 0 and all the values greater than 1 should be 1, except changing those values which is already between 0 and 1. What i think is, still it would be inefficicent in terms of simulation time. I would really appreaciate if one can suggest any possible solution for this.

Thanks in advance!

Francisco Serra on 13 Nov 2023

Edited: Francisco Serra on 13 Nov 2023

Open in MATLAB Online

Hello. I am trying to control a dynamical system ̈p=u, driving p to 0.

For that I am using a rlPPOAgent. I want the actions to be bounded by -10 and 10 (-10 < u < 10).

If the actor samples from a gaussian distribution in which the mean and stdv values are given by the Neural Network, how can we ensure the boundedness? The rlNumericSpec is only a way to store the values, but does nothing in practical terms, right? I tried to apply a tanh activation function to the meanPath of my actor to squash the values to [-1, 1] and then apply a scaling layer to scale it to [-10, 10]:

meanPath = [
    fullyConnectedLayer(16, 'Name', 'meanPathIn')
    reluLayer('Name', 'relu5')
    fullyConnectedLayer(numAct, 'Name', 'fc6')
    tanhLayer(Name="tanhStd")
    scalingLayer(Name='meanPathOut', ...
    Scale=ainfo.UpperLimit)];  #--> this is where the rlNumericSpec defined above is used

the standard deviation only has a ReLu layer to enforce non-negativity.

In my way of seeing this,I am bounding the mean value to be in this interval, but the actual sampled action can get out of this bounds.

This is one episode of my training process in which we can see that the control input u isn't bounded!

Can somebody help?

Sign in to comment.

RL: Continuous action space, but within a desired range

0 Comments
Show -2 older commentsHide -2 older comments

Answers (1)

4 Comments
Show 2 older commentsHide 2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

RL: Continuous action space, but within a desired range

0 Comments Show -2 older commentsHide -2 older comments

Answers (1)

4 Comments Show 2 older commentsHide 2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

4 Comments
Show 2 older commentsHide 2 older comments