Deep Reinforcement Learning Hands-On: Apply modern RL methods, with deep Q-networks, value iteration, policy gradients, TRPO, AlphaGo Zero and more by Maxim Lapan
Author:Maxim Lapan [Lapan, Maxim]
Language: eng
Format: epub
Publisher: Packt Publishing
Published: 2018-06-20T23:00:00+00:00
Exploration
Even with the policy represented as probability distribution, there is a high chance that the agent will converge to some locally-optimal policy and stop exploring the environment. In DQN, we solved this using epsilon-greedy action selection: with probability epsilon, the agent took some random action instead of the action dictated by the current policy. We can use the same approach, of course, but PG allows us to follow a better path, called the entropy bonus.
In the information theory, the entropy is a measure of uncertainty in some system. Being applied to agent policy, entropy shows how much the agent is uncertain about which action to take. In math notation, entropy of the policy is defined as: . The value of entropy is always greater than zero and has a single maximum when the policy is uniform. In other words, all actions have the same probability. Entropy becomes minimal when our policy has 1 for some action and 0 for all others, which means that the agent is absolutely sure what to do. To prevent our agent from being stuck in the local minimum, we are subtracting the entropy from the loss function, punishing the agent for being too certain about the action to take.
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Deep Learning with Python by François Chollet(12571)
Hello! Python by Anthony Briggs(9916)
OCA Java SE 8 Programmer I Certification Guide by Mala Gupta(9796)
The Mikado Method by Ola Ellnestam Daniel Brolund(9779)
Dependency Injection in .NET by Mark Seemann(9340)
Algorithms of the Intelligent Web by Haralambos Marmanis;Dmitry Babenko(8299)
Test-Driven iOS Development with Swift 4 by Dominik Hauser(7763)
Grails in Action by Glen Smith Peter Ledbrook(7696)
The Well-Grounded Java Developer by Benjamin J. Evans Martijn Verburg(7557)
Becoming a Dynamics 365 Finance and Supply Chain Solution Architect by Brent Dawson(7082)
Microservices with Go by Alexander Shuiskov(6853)
Practical Design Patterns for Java Developers by Miroslav Wengner(6770)
Test Automation Engineering Handbook by Manikandan Sambamurthy(6709)
Secrets of the JavaScript Ninja by John Resig Bear Bibeault(6416)
Angular Projects - Third Edition by Aristeidis Bampakos(6115)
The Art of Crafting User Stories by The Art of Crafting User Stories(5645)
NetSuite for Consultants - Second Edition by Peter Ries(5577)
Demystifying Cryptography with OpenSSL 3.0 by Alexei Khlebnikov(5382)
Kotlin in Action by Dmitry Jemerov(5065)
