rlEpsilonGreedyPolicy
Policy object to generate discrete epsilon-greedy actions for custom training loops
Since R2022a
Description
This object implements an epsilon-greedy policy, which returns either the action
that maximizes a discrete action-space Q-value function, with probability
1-Epsilon
, or a random action otherwise, given an input observation. You
can create an rlEpsilonGreedyPolicy
object from an rlQValueFunction
or
rlVectorQValueFunction
object, or extract it from an rlQAgent
, rlDQNAgent
or rlSARSAAgent
. You can
then train the policy object using a custom training loop or deploy it for your application.
If UseEpsilonGreedyAction
is set to 0
the policy is
deterministic, therefore in this case it does not explore. This object is not compatible with
generatePolicyBlock
and generatePolicyFunction
. For more information on policies and value functions,
see Create Policies and Value Functions.
Creation
Description
creates the epsilon-greedy policy object policy
= rlEpsilonGreedyPolicy(qValueFunction
)policy
from the discrete
action-space Q-value function qValueFunction
. It also sets the
QValueFunction
property of policy
to the
input argument qValueFunction
.
Properties
Object Functions
getAction | Obtain action from agent, actor, or policy object given environment observations |
getLearnableParameters | Obtain learnable parameter values from agent, function approximator, or policy object |
reset | Reset environment, agent, experience buffer, or policy object |
setLearnableParameters | Set learnable parameter values of agent, function approximator, or policy object |
Examples
Version History
Introduced in R2022a
See Also
Functions
getGreedyPolicy
|getExplorationPolicy
|generatePolicyBlock
|generatePolicyFunction
|getAction
|getLearnableParameters
|setLearnableParameters
Objects
rlMaxQPolicy
|rlDeterministicActorPolicy
|rlAdditiveNoisePolicy
|rlStochasticActorPolicy
|rlQValueFunction
|rlVectorQValueFunction
|rlSARSAAgent
|rlQAgent
|rlDQNAgent