syncParameters

Modify the learnable parameters of one approximator towards the learnable parameters of another approximator

Since R2022a

Syntax

zFcnAppx = syncParameters(xFcnAppx,yFcnAppx,smoothFactor)

Description

zFcnAppx = syncParameters(xFcnAppx,yFcnAppx,smoothFactor) returns an updated function approximator object of the same type and configuration of xFcnAppx, but with its learnable parameters updated towards yFcnAppx, according to the smooth factor smoothFactor.

example

Examples

collapse all

Synchronize Critic Parameter Values

Open Live Script

For this example, create two value function critics and sync their parameters.

First, create an finite set observation specification for a scalar that can have four different values.

obsInfo = rlFiniteSetSpec(1:4);

Create a table object. Table values are initialized to zero by default.

table = rlTable(obsInfo);

Create a base critic.

Vx = rlValueFunction(table,obsInfo);

Set the table values to different values.

table.Table = [1 -1 -10 100]';

Use the updated table to create a new critic.

Vy = rlValueFunction(table,obsInfo);

Sync the parameter values of the base critic Vx, moving them by one fifth of the way towards the parameter values of the new critic Vy.

Vz = syncParameters(Vx,Vy,0.2);

Display the learnable parameters of the new critic Vz.

Vz.Learnables{1}

ans = 
  4×1 dlarray

    0.2000
   -0.2000
   -2.0000
   20.0000

Input Arguments

collapse all

`xFcnAppx` — Base actor or critic object
reinforcement learning function approximator object

Base function approximator object, specified as one of the following:

rlValueFunction object — Value function critic
rlQValueFunction object — Q-value function critic
rlVectorQValueFunction object — Multi-output Q-value function critic with a discrete action space
rlContinuousDeterministicActor object — Deterministic policy actor with a continuous action space
rlDiscreteCategoricalActor object — Stochastic policy actor with a discrete action space
rlContinuousGaussianActor object — Stochastic policy actor with a continuous action space
rlContinuousDeterministicTransitionFunction object — Continuous deterministic transition function for neural network environments
rlContinuousGaussianTransitionFunction object — Continuous Gaussian transition function for neural network environments
rlContinuousDeterministicRewardFunction object — Continuous deterministic reward function for neural network environments
rlContinuousGaussianRewardFunction object — Continuous Gaussian reward function for neural network environments
rlIsDoneFunction object — IsDone function for neural network environments

To create an actor or critic function object, use one of the following methods.

Create the function approximator object directly.
Obtain the existing critic from an agent using getCritic.
Obtain the existing actor from an agent using getActor.

`yFcnAppx` — New actor or critic object
reinforcement learning function approximator object

New actor or critic object, specified as a function approximator object with a parameter cell array having the same dimensions as the one of xFcnAppx.

`smoothFactor` — Smooth factor
positive scalar smaller than one

Smooth factor, specified as a positive scalar smaller than one. This factor regulates the extent to which the parameters of xFcnAppx are updated towards the parameters of yFcnAppx. This operation is akin to a single step of a first order low-pass filter update on the xFcnAppx learnable parameters.

Specifically, if P_z is the parameter vector of zFcnAppx, then:

P_z = sP_y + (1-s)P_x

where P_y and P_x are the parameter vectors of yFcnAppx and xFcnAppx, respectively.

For example, if you use a smooth factor of 1, the parameters of zFcnAppx are equal to the parameters of yFcnAppx. If you use a smooth factor of 0.5, parameters of zFcnAppx are equal to the average between the parameters of yFcnAppx and xFcnAppx.

Output Arguments

collapse all

`zFcnAppx` — Updated target actor or critic object
reinforcement learning function approximator object

Updated target actor or critic object, returned as a function approximator object of the same type as xFcnAppx. The learnable parameter values of zFcnAppx are set as a convex combination between the ones in xFcnAppx and the ones in yFcnAppx. For example, as specified in the description of smoothFactor, using a smooth factor of 1 results in zFcnAppx parameters equal to yFcnAppx parameters, while using a smooth factor of 0.5 results in zFcnAppx parameters equal to the average between parameters in xFcnAppx and yFcnAppx.

Version History

Introduced in R2022a

How useful was this information?

Unrated 1 star 2 stars 3 stars 4 stars 5 stars

syncParameters

Syntax

Description

Examples

Synchronize Critic Parameter Values

Input Arguments

`xFcnAppx` — Base actor or critic object
reinforcement learning function approximator object

`yFcnAppx` — New actor or critic object
reinforcement learning function approximator object

`smoothFactor` — Smooth factor
positive scalar smaller than one

Output Arguments

`zFcnAppx` — Updated target actor or critic object
reinforcement learning function approximator object

Version History

See Also

Functions

Objects

Topics

syncParameters

Syntax

Description

Examples

Synchronize Critic Parameter Values

Input Arguments

xFcnAppx — Base actor or critic object reinforcement learning function approximator object

yFcnAppx — New actor or critic object reinforcement learning function approximator object

smoothFactor — Smooth factor positive scalar smaller than one

Output Arguments

zFcnAppx — Updated target actor or critic object reinforcement learning function approximator object

Version History

See Also

Functions

Objects

Topics

`xFcnAppx` — Base actor or critic object
reinforcement learning function approximator object

`yFcnAppx` — New actor or critic object
reinforcement learning function approximator object

`smoothFactor` — Smooth factor
positive scalar smaller than one

`zFcnAppx` — Updated target actor or critic object
reinforcement learning function approximator object