Options for evaluating reinforcement learning agents during training
rlEvaulator object to specify options to evaluate agents
periodically during training. Evaluation options include the type of evaluation statistic, the
frequency at which evaluation episodes occur, and whether exploration is allowed during an
evaluation episode. To train the agents using the specified evaluation options, pass this
For more information on training agents, see Train Reinforcement Learning Agents.
returns the evaluator
evalOpts = rlEvaluator
evalOpts, which contains default options for evaluating an
agent during training.
creates the evaluator object
evalOpts = rlEvaluator(
evalOpts and sets its properties using
one or more name-value arguments.
EvaluationStatisticType — Type of evaluation statistic
"MeanEpisodeReward" (default) |
"MinEpisodeReward" | ...
Type of evaluation statistic for a group of
NumEpisodes consecutive evaluation episodes, specified as one of these
"MeanEpisodeReward"— Mean value of the evaluation episodes rewards. This is the default behavior.
"MedianEpisodeReward"— Median value of the evaluation episodes rewards.
"MaxEpisodeReward"— Maximum value of the evaluation episodes rewards.
"MinEpisodeReward"— Minimum value of the evaluation episodes rewards.
This value is returned by
train as the element of the
EvaluationStatistics vector corresponding to the training episode that
precedes the group of consecutive evaluation episodes. For more information, see
NumEpisodes — Number of consecutive evaluation episodes
3 (default) | positive integer
Number of consecutive evaluation episodes, specified as a positive integer. After
EvaluationFrequency training episodes,
NumEpisodes evaluation episodes.
For example, if
3 then three evaluation
episodes are run, consecutively, after 100 training episodes. These three evaluation
episodes are used to calculate a single statistic, specified by
EvaluationStatisticType, which is returned as the 100th element
of the vector in the
EvaluationStatistic property of the
rlTrainingResults object returned by
After 200 training episodes, three new evaluation episodes are run, with their statistic
returned in the 200th element of
EvaluationStatistic, and so
MaxStepsPerEpisode — Maximum number of steps to run for an evaluation episode
 (default) | positive integer
Maximum number of steps to run for an evaluation episode, specified as a positive integer. This value is the maximum number of steps to run for an evaluation episode if other termination conditions are not met before. To accurately assess the agent stability and performance, it is often useful to specify a larger number of steps for an evaluation episode, with respect to a training episode.
If empty (default), the
MaxStepsPerEpisode property specified
for training (see
rlTrainingOptions) is used.
UseExplorationPolicy — Option to use exploration policy during evaluation episodes
0 (default) |
Option to use exploration policy during evaluation episodes, specified as a one of the following logical values.
false) — The agent uses its base greedy policy when selecting actions during an evaluation episode. This is the default behavior.
true) — The agent uses its base exploration policy when selecting actions during an evaluation episode.
RandomSeeds — Random seeds used for evaluation episodes
1 (default) |
 | nonnegative integer | vector of nonnegative integers
Random seeds used for evaluation episodes, specified as one of the following.
— The random seed is not initialized before an evaluation episode.
Nonnegative integer — The random seed is reinitialized to the specified value before the first of the
NumEpisodesconsecutive evaluation episodes occurring after
EvaluationFrequencytraining episodes. This is the default behavior, with the seed initialized to
Vector of nonnegative integers with
NumEpisodeselements — Before each episode of an evaluation sequence, the random seed is reinitialized to the corresponding element of the specified vector. This guarantees that the ith episode of each evaluation sequence always runs with the same random seed, which helps when comparing evaluation episodes occurring at different stages of training.
The current random seed used for training is stored before the first episode of an evaluation sequence and reset as the current seed after the evaluation sequence. This ensures that the training results with evaluation are the same as the results without evaluation.
EvaluationFrequency — Evaluation period
100 (default) | positive integer
Evaluation period, specified as a positive integer. It is the number of episodes
NumEpisodes evaluation episodes are run. For example,
3, three evaluation episodes
are run, consecutively, after 100 episodes. The default is
Create Options to Evaluate Agent During Training
rlEvaluator object to evaluate an agent during training.
Configure the evaluator to run five consecutive evaluation episodes every 100 training episodes using fixed random seeds for each evaluation episode.
evl = rlEvaluator( ... NumEpisodes=5, ... EvaluationFrequency=100, ... RandomSeeds=[11,15,20,30,99])
evl = rlEvaluator with properties: EvaluationStatisticType: "MeanEpisodeReward" NumEpisodes: 5 MaxStepsPerEpisode:  UseExplorationPolicy: 0 RandomSeeds: [11 15 20 30 99] EvaluationFrequency: 100
You can use dot notation to change some of the values. Set the maximum number of steps for evaluation episodes to 1000.
evl.MaxStepsPerEpisode = 1000;
To evaluate an agent during training using these evaluation options, pass
train, as in the following code example.
results = train(agent, env, trainingOptions, Evaluator=evOpts);
For more information see
Introduced in R2023b