Reinforcement Learning of Bimanual Robot Skills by Adrià Colomé & Carme Torras

Reinforcement Learning of Bimanual Robot Skills by Adrià Colomé & Carme Torras

Author:Adrià Colomé & Carme Torras
Language: eng
Format: epub
ISBN: 9783030263263
Publisher: Springer International Publishing


Now we briefly present two of the most popular PS algorithms found in literature: REPS and PI2, which have been used in the experiments throughout this monograph.

5.1.1.1 Relative Entropy Policy Search (REPS)

Formally, REPS [3, 13] finds the policy that maximizes the expected reward for a given task. The REPS algorithm uses Kullback-Leibler (KL) divergence [8], which is a non-symmetric indicator of the difference between two probability distributions p, q over a random variable x:

(5.5)

Given the previous policy , the new policy is obtained by adding a KL-Divergence bound between the newly obtained policy and the previous one to the optimization of the expected reward. The bound on the KL-Divergence limits the variation on the new policy and prevents the PS algorithm from being too greedy. Too greedy algorithms can be a wrong approach in some robotics applications, where a drastic change in the policy may result in an unpredictable, dangerous behavior of the robot. Such new policy is then computed as the solution of:



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.