Main Content

syncParameters

Modify the learnable parameters of one approximator toward the learnable parameters of another approximator

Since R2022a

    Description

    updatedFcnAppx = syncParameters(fcnAppx,targetFcnAppx,smoothFactor) returns an updated function approximator object of the same type and configuration of fcnAppx, but with its learnable parameters updated towards targetFcnAppx, according to the smooth factor smoothFactor.

    example

    Examples

    collapse all

    For this example, create two value function critics and sync their parameters.

    First, create an finite set observation specification for a scalar that can have four different values.

    obsInfo = rlFiniteSetSpec(1:4);

    Create a table object. Table values are initialized to zero by default.

    table = rlTable(obsInfo);

    Create a base critic.

    Vx = rlValueFunction(table,obsInfo);

    Set the table values to different values.

    table.Table = [1 -1 -10 100]';

    Use the updated table to create a new critic.

    Vy = rlValueFunction(table,obsInfo);

    Sync the parameter values of the base critic Vx, moving them by one fifth of the way toward the parameter values of the new critic Vy.

    Vz = syncParameters(Vx,Vy,0.2);

    Display the learnable parameters of the new critic Vz.

    Vz.Learnables{1}
    ans = 
      4×1 dlarray
    
        0.2000
       -0.2000
       -2.0000
       20.0000
    
    

    Input Arguments

    collapse all

    Function approximator, specified as one of the following objects:

    To create an actor or critic function object, use one of the following methods.

    • Create a function object directly.

    • Obtain the existing critic from an agent using getCritic.

    • Obtain the existing actor from an agent using getActor.

    Example: critic = rlValueFunction(dlnetwork([featureInputLayer(2) fullyConnectedLayer(10) reluLayer fullyConnectedLayer(1)]),rlNumericSpec([2 1])); creates the rlValueFunction object critic.

    New actor or critic object, specified as a function approximator object with a parameter cell array having the same dimensions as the one of fcnAppx.

    Example: critic = rlValueFunction(dlnetwork([featureInputLayer(2) fullyConnectedLayer(10) reluLayer fullyConnectedLayer(1)]),rlNumericSpec([2 1])); creates the rlValueFunction object critic.

    Smooth factor, specified as a positive scalar smaller than one. This factor regulates the extent to which the parameters of fcnAppx are updated toward the parameters of targetFcnAppx. This operation is akin to a single step of a first order low-pass filter update on the fcnAppx learnable parameters.

    Specifically, if Pz is the parameter vector of updatedFcnAppx, then:

    Pz = sPy + (1-s)Px

    where Py and Px are the parameter vectors of targetFcnAppx and fcnAppx, respectively.

    For example, if you use a smooth factor of 1, the parameters of updatedFcnAppx are equal to the parameters of targetFcnAppx. If you use a smooth factor of 0.5, parameters of updatedFcnAppx are equal to the average between the parameters of targetFcnAppx and fcnAppx.

    Example: 0.2

    Output Arguments

    collapse all

    Updated target actor or critic object, returned as a function approximator object of the same type as fcnAppx. The learnable parameter values of updatedFcnAppx are set as a convex combination between the ones in fcnAppx and the ones in targetFcnAppx. For example, as specified in the description of smoothFactor, using a smooth factor of 1 results in updatedFcnAppx parameters equal to targetFcnAppx parameters, while using a smooth factor of 0.5 results in updatedFcnAppx parameters equal to the average between parameters in fcnAppx and targetFcnAppx.

    Version History

    Introduced in R2022a