# getMaxQValue

Obtain maximum estimated value over all possible actions from a Q-value function critic with discrete action space, given environment observations

Since R2020a

## Syntax

``[maxQ,maxActionIndex] = getMaxQValue(qValueFcnObj,obs)``
``[maxQ,maxActionIndex,nextState] = getMaxQValue(___)``
``___ = getMaxQValue(___,UseForward=useForward)``

## Description

example

````[maxQ,maxActionIndex] = getMaxQValue(qValueFcnObj,obs)` evaluates the discrete-action-space Q-value function critic `qValueFcnObj` and returns the maximum estimated value over all possible actions `maxQ`, with the corresponding action index `maxActionIndex`, given environment observations `obs`.```
````[maxQ,maxActionIndex,nextState] = getMaxQValue(___)` also returns the updated state of `qValueFcnObj` when it contains a recurrent neural network.```
````___ = getMaxQValue(___,UseForward=useForward)` allows you to explicitly call a forward pass when computing gradients.```

## Examples

collapse all

Create an observation and action specification objects (or alternatively use `getObservationInfo` and `getActionInfo` to extract the specification objects from an environment. For this example, define the observation space as a continuous three-dimensional space, and the action space as a finite set consisting of three possible values (named -1, 0, and 1).

```obsInfo = rlNumericSpec([3 1]); actInfo = rlFiniteSetSpec([-1 0 1]);```

Create a default DQN agent and extract its critic.

```agent = rlDQNAgent(obsInfo,actInfo); critic = getCritic(agent);```

Use `getMaxQValue` to return the maximum value, among the possible actions, given a random observation. Also return the index corresponding to the action that maximizes the value.

`[v,i] = getMaxQValue(critic,{rand(3,1)})`
```v = single -0.0430 ```
```i = 3 ```

Create a batch set of 64 random independent observations. The third dimension is the batch size, while the fourth is the sequence length for any recurrent neural network used by the critic (in this case not used).

`batchobs = rand(3,1,64,1);`

Obtain maximum values for all the observations.

```bv = getMaxQValue(critic,{batchobs}); size(bv)```
```ans = 1×2 1 64 ```

Select the maximum value corresponding to the 44th observation.

`bv(44)`
```ans = single -0.0516 ```

## Input Arguments

collapse all

Q-value function critic, specified as an `rlQValueFunction` or `rlVectorQValueFunction` object.

Environment observations, specified as a cell array with as many elements as there are observation input channels. Each element of `obs` contains an array of observations for a single observation input channel.

The dimensions of each element in `obs` are MO-by-LB-by-LS, where:

• MO corresponds to the dimensions of the associated observation input channel.

• LB is the batch size. To specify a single observation, set LB = 1. To specify a batch of observations, specify LB > 1. If `qValueFcnObj` has multiple observation input channels, then LB must be the same for all elements of `obs`.

• LS specifies the sequence length for a recurrent neural network. If `qValueFcnObj` does not use a recurrent neural network, then LS = 1. If `qValueFcnObj` has multiple observation input channels, then LS must be the same for all elements of `obs`.

LB and LS must be the same for both `act` and `obs`.

For more information on input and output formats for recurrent neural networks, see the Algorithms section of `lstmLayer`.

Option to use forward pass, specified as a logical value. When you specify `UseForward=true` the function calculates its outputs using `forward` instead of `predict`. This allows layers such as batch normalization and dropout to appropriately change their behavior for training.

Example: `true`

## Output Arguments

collapse all

Maximum Q-value estimate across all possible discrete actions, returned as a 1-by-LB-by-LS array, where:

• LB is the batch size.

• LS specifies the sequence length for a recurrent neural network. If `qValueFcnObj` does not use a recurrent neural network, then LS = 1.

Action index corresponding to the maximum Q value, returned as a 1-by-LB-by-LS array, where:

• LB is the batch size.

• LS specifies the sequence length for a recurrent neural network. If `qValueFcnObj` does not use a recurrent neural network, then LS = 1.

Updated state of `qValueFcnObj`, returned as a cell array. If `qValueFcnObj` does not use a recurrent neural network, then `nextState` is an empty cell array.

You can set the state of the critic to `state` using dot notation. For example:

`qValueFcnObj.State=state;`

## Tips

When the elements of the cell array in `inData` are `dlarray` objects, the elements of the cell array returned in `outData` are also `dlarray` objects. This allows `getMaxQValue` to be used with automatic differentiation.

Specifically, you can write a custom loss function that directly uses `getMaxQValue` and `dlgradient` within it, and then use `dlfeval` and `dlaccelerate` with your custom loss function. For an example, see Train Reinforcement Learning Policy Using Custom Training Loop and Custom Training Loop with Simulink Action Noise.

## Version History

Introduced in R2020a