Deep Reinforcement Learning in Action by Alexander Zai Brandon Brown

Deep Reinforcement Learning in Action by Alexander Zai Brandon Brown

Author:Alexander Zai, Brandon Brown
Language: eng
Format: mobi, epub, pdf
Publisher: Manning Publications
Published: 2020-03-28T15:14:44.670000+00:00


return target_dist_batch

1 Loops through the batch dimension

2 If the reward is not –1, it is a terminal state and the target is a degenerate distribution at the reward value.

3 If the state is nonterminal, the target distribution is a Bayesian update of the prior given the reward.

4 Only changes the distribution for the action that was taken

The get_target_dist function takes a batch of data of shape B × 3 × 51 where B is the batch dimension, and it returns an equal-sized tensor. For example, if we only have one example in our batch, 1 × 3 × 51, and the agent took action 1 and observed a reward of –1, this function would return a 1 × 3 × 51 tensor, except that the 1 × 51 distribution associated with index 1 (of dimension 1) will be changed according to the update_dist function using the observed reward of –1. If the observed reward was instead 10, the 1 × 51 distribution associated with action 1 would be updated to be a degenerate distribution where all elements have 0 probability except the one associated with the reward of 10 (index 50).



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.