In TrainMBPOAgentToBalanceCartPoleSystemExample/ cartPoleRewardFunction ,(nextObs)is what?
Show older comments
function reward = cartPoleRewardFunction(obs,action,nextObs)
% Compute reward value based on the next observation.
if iscell(nextObs)
nextObs = nextObs{1};
end
% Distance at which to fail the episode
xThreshold = 2.4;
% Reward each time step the cart-pole is balanced
rewardForNotFalling = 1;
% Penalty when the cart-pole fails to balance
penaltyForFalling = -50;
x = nextObs(1,:);
distReward = 1 - abs(x)/xThreshold;
isDone = cartPoleIsDoneFunction(obs,action,nextObs);
reward = zeros(size(isDone));
reward(logical(isDone)) = penaltyForFalling;
reward(~logical(isDone)) = ...
0.5 * rewardForNotFalling + 0.5 * distReward(~logical(isDone));
end
I really want to know where nextObs is passing this function in from? Why can't I find this variable in the main function.
If my environment is built from Simulink, how do I get the nextObs variable?
Accepted Answer
More Answers (0)
Categories
Find more on Environments in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
