rlEvaluator

Options for evaluating reinforcement learning agents during training

Since R2023b

Description

Use an rlEvaluator object to specify options to evaluate agents periodically during training. Evaluation options include the type of evaluation statistic, the frequency at which evaluation episodes occur, and whether exploration is allowed during an evaluation episode. To train the agents using the specified evaluation options, pass this object to train.

For more information on training agents, see Train Reinforcement Learning Agents.

Creation

Syntax

evalOpts = rlEvaluator

evalOpts = rlEvaluator(PropertyName=Value)

Description

evalOpts = rlEvaluator returns the evaluator object evalOpts, which contains default options for evaluating an agent during training.

evalOpts = rlEvaluator(PropertyName=Value) creates the evaluator object evalOpts and sets its properties using one or more name-value arguments.

example

Properties

expand all

`EvaluationStatisticType` — Type of evaluation statistic
`"MeanEpisodeReward"` (default) | `"MedianEpisodeReward"` | `"MaxEpisodeReward"` | `"MinEpisodeReward"` | ...

Type of evaluation statistic for each group of NumEpisodes consecutive evaluation episodes, specified as one of these strings:

"MeanEpisodeReward" — Mean value of the evaluation episodes rewards. This is the default behavior.
"MedianEpisodeReward" — Median value of the evaluation episodes rewards.
"MaxEpisodeReward" — Maximum value of the evaluation episodes rewards.
"MinEpisodeReward" — Minimum value of the evaluation episodes rewards.

In the training results object returned by train, this value is the element of the EvaluationStatistics vector corresponding to the training episode that precedes the group of consecutive evaluation episodes. For more information, see NumEpisodes.

Example: EvaluationStatisticType="MinEpisodeReward"

`NumEpisodes` — Number of consecutive evaluation episodes
`3` (default) | positive integer

Number of consecutive evaluation episodes, specified as a positive integer. After every EvaluationFrequency consecutive training episodes, train runs NumEpisodes consecutive evaluation episodes.

For example, if EvaluationFrequency is 100 and NumEpisodes is 3 then three evaluation episodes are run, consecutively, after 100 training episodes. These three evaluation episodes are used to calculate a single statistic, specified by EvaluationStatisticType, which is returned as the 100th element of the vector in the EvaluationStatistic property of the rlTrainingResults object returned by train. After 200 training episodes, three new evaluation episodes are run, with their statistic returned in the 200th element of EvaluationStatistic, and so on.

Example: NumEpisodes=5

`MaxStepsPerEpisode` — Maximum number of steps to run for an evaluation episode
`[]` (default) | positive integer

Maximum number of steps to run for an evaluation episode, specified as a positive integer. This value is the maximum number of steps to run for an evaluation episode if other termination conditions are not met before. To accurately assess the agent stability and performance, it is often useful to specify a larger number of steps for an evaluation episode, with respect to a training episode.

If empty (default), the MaxStepsPerEpisode property specified for training (see rlTrainingOptions) is used.

Example: MaxStepsPerEpisode=1000

`UseExplorationPolicy` — Option to use exploration policy during evaluation episodes
`false` or `0` (default) | `true` or `1`

Option to use exploration policy during evaluation episodes, specified as a one of the following logical values.

0 (false) — The agent uses its base greedy policy when selecting actions during an evaluation episode. This is the default behavior.
1 (true) — The agent uses its base exploration policy when selecting actions during an evaluation episode.

`RandomSeeds` — Random seeds used for evaluation episodes
`1` (default) | `[]` | nonnegative integer | vector of nonnegative integers

Random seeds used for evaluation episodes, specified as one of the following.

[] — The random seed is not initialized before an evaluation episode.
Nonnegative integer — The random seed is reinitialized to the specified value before the first of the NumEpisodes consecutive evaluation episodes occurring after EvaluationFrequency training episodes. This is the default behavior, with the seed initialized to 1.
Vector of nonnegative integers with NumEpisodes elements — Before each episode of an evaluation sequence, the random seed is reinitialized to the corresponding element of the specified vector. This guarantees that the ith episode of each evaluation sequence always runs with the same random seed, which helps when comparing evaluation episodes occurring at different stages of training.

The current random seed used for training is stored before the first episode of an evaluation sequence and reset as the current seed after the evaluation sequence. This ensures that the training results with evaluation are the same as the results without evaluation.

Example: RandomSeeds=0

`EvaluationFrequency` — Evaluation period
`100` (default) | positive integer

Evaluation period, specified as a positive integer. It is the number of consecutive training episodes after which NumEpisodes evaluation episodes are run. For example, if EvaluationFrequency is 100 and NumEpisodes is 3, three evaluation episodes are run, consecutively, after 100 episodes. The default is 100.

Example: EvaluationFrequency=200

Object Functions

Examples

collapse all

Create Options to Evaluate Agent During Training

Open Live Script

Create an rlEvaluator object to evaluate an agent during training.

Configure the evaluator to run five consecutive evaluation episodes every 100 training episodes using fixed random seeds for each evaluation episode.

evl = rlEvaluator( ...
    NumEpisodes=5, ...
    EvaluationFrequency=100, ...
    RandomSeeds=[11,15,20,30,99])

evl = 
  rlEvaluator with properties:

    EvaluationStatisticType: "MeanEpisodeReward"
                NumEpisodes: 5
         MaxStepsPerEpisode: []
       UseExplorationPolicy: 0
                RandomSeeds: [11 15 20 30 99]
        EvaluationFrequency: 100

You can use dot notation to change some of the values. Set the maximum number of steps for evaluation episodes to 1000.

evl.MaxStepsPerEpisode = 1000;

To evaluate an agent during training using these evaluation options, pass evl to train, as in the following code example.

results = train(agent, env, trainingOptions, Evaluator=evOpts);

For more information, see train.

Version History

Introduced in R2023b

rlEvaluator

Description

Creation

Syntax

Description

Properties

`EvaluationStatisticType` — Type of evaluation statistic
`"MeanEpisodeReward"` (default) | `"MedianEpisodeReward"` | `"MaxEpisodeReward"` | `"MinEpisodeReward"` | ...

`NumEpisodes` — Number of consecutive evaluation episodes
`3` (default) | positive integer

`MaxStepsPerEpisode` — Maximum number of steps to run for an evaluation episode
`[]` (default) | positive integer

`UseExplorationPolicy` — Option to use exploration policy during evaluation episodes
`false` or `0` (default) | `true` or `1`

`RandomSeeds` — Random seeds used for evaluation episodes
`1` (default) | `[]` | nonnegative integer | vector of nonnegative integers

`EvaluationFrequency` — Evaluation period
`100` (default) | positive integer

Object Functions

Examples

Create Options to Evaluate Agent During Training

Version History

See Also

Functions

Objects

Topics

rlEvaluator

Description

Creation

Syntax

Description

Properties

EvaluationStatisticType — Type of evaluation statistic "MeanEpisodeReward" (default) | "MedianEpisodeReward" | "MaxEpisodeReward" | "MinEpisodeReward" | ...

NumEpisodes — Number of consecutive evaluation episodes 3 (default) | positive integer

MaxStepsPerEpisode — Maximum number of steps to run for an evaluation episode [] (default) | positive integer

UseExplorationPolicy — Option to use exploration policy during evaluation episodes false or 0 (default) | true or 1

RandomSeeds — Random seeds used for evaluation episodes 1 (default) | [] | nonnegative integer | vector of nonnegative integers

EvaluationFrequency — Evaluation period 100 (default) | positive integer

Object Functions

Examples

Create Options to Evaluate Agent During Training

Version History

See Also

Functions

Objects

Topics

`EvaluationStatisticType` — Type of evaluation statistic
`"MeanEpisodeReward"` (default) | `"MedianEpisodeReward"` | `"MaxEpisodeReward"` | `"MinEpisodeReward"` | ...

`NumEpisodes` — Number of consecutive evaluation episodes
`3` (default) | positive integer

`MaxStepsPerEpisode` — Maximum number of steps to run for an evaluation episode
`[]` (default) | positive integer

`UseExplorationPolicy` — Option to use exploration policy during evaluation episodes
`false` or `0` (default) | `true` or `1`

`RandomSeeds` — Random seeds used for evaluation episodes
`1` (default) | `[]` | nonnegative integer | vector of nonnegative integers

`EvaluationFrequency` — Evaluation period
`100` (default) | positive integer