Dynamic Games for Network Security by Xiaofan He & Huaiyu Dai

Dynamic Games for Network Security by Xiaofan He & Huaiyu Dai

Author:Xiaofan He & Huaiyu Dai
Language: eng
Format: epub
Publisher: Springer International Publishing, Cham


(3.14)

Different from the minimax-PDS algorithm, a WoLF-PDS agent has to further maintain a record about the empirical average performance of taking action a at state s. To this end, the PDS agent updates an empirical reward function at each timeslot for all a by using the following equation

(3.15)

Once both the updated PDS quality function and the empirical reward function are obtained, the standard Q-function will be adjusted by

(3.16)

The above equation indicates that the WoLF-PDS algorithm can update multiple standard Q-functions at each timeslot and thus substantially expedite the learning speed. The rest steps of WoLF-PDS for updating the state occurrence count C, the empirical average policy , and the policy π (wp) are the same as the original WoLF algorithm discussed in Chap. 1.

Neither the conventional WoLF algorithm nor the WoLF-PDS algorithm has the convergence property as the minimax-PDS algorithm. Nonetheless, the rationality property of the WoLF algorithm is inherited by the WoLF-PDS algorithm. In particular, we have the following result.

Theorem 5 ([19])

The WoLF-PDS is rational, when the learning rate sequence {α n } n≥1 satisfies 0 ≤ α n  < 1, and .



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.