Playing Smart by Julian Togelius

Playing Smart by Julian Togelius

Author:Julian Togelius
Language: eng
Format: epub, pdf
Tags: AI; algorithm; video games; videogames; Mario; Angry Birds; DOOM; Tomb Raider; World of Warcraft; WOW; Chess; Go; Call of Duty
Publisher: The MIT Press
Published: 2018-11-05T16:00:00+00:00


Trial and Error on Speed

Evolutionary computation can be described as a process of massive trial and error. It seems to be an enormously wasteful process—all those neural nets that are somewhat worse than the best neural nets of each generation are simply thrown away. None of the information they encountered in their brief “lives” is saved. Yet the process of evolution through selection works, both in nature (as we are living proof of) and inside computer programs. But is there another way we could learn from experience to create effective AI, perhaps preserving more information?

The problem of learning to perform a task given only intermittent feedback about how well you’re doing is called the reinforcement learning problem, importing some terminology from behaviorist psychology (the kind where psychologists make rats pull levers and run around in mazes) to computer science. There are essentially two broad approaches to solving these problems. The less common is to use some form of evolutionary algorithm. The more common is to use some form of approximate dynamic programming, such as the Q-learning algorithm.

You can think of it this way: whereas evolutionary computing models the type of learning that takes place across multiple lifetimes, Q-learning (and similar algorithms) models the kind of learning that takes place during a lifetime. Instead of learning based on a single fitness value at the end of an attempt to perform a task (as evolution does), Q-learning can learn from many events as the task is performed. Instead of making random changes to the complete neural network (as happens in evolution), in Q-learning the changes are taken in specific directions in response to positive or negative rewards.

In Q-learning, the neural network takes inputs that represent what the agent “sees,” just like the evolved car control network I described in the previous section. The networks also take inputs describing what action the agent is considering to take; in the car racing domain, it could be steer left, steer right, accelerate, and brake (or some combination). The output is a Q-value, which is an estimate of how good a particular action would be in a particular state (situation). So instead of mapping sensor inputs to actions, the network maps sensor inputs and actions to Q-values. The way this neural network is used to do something, such as driving a car, is that every time it needs to make a decision, it tests all possible actions and makes the one with the highest Q-value in the current state.

Obviously the neural network needs to be trained before it is useful; a network that outputs random Q-values is not going to win any races or solve any other problems, for that matter. The basic idea of training a neural network using Q-learning is to compare the predicted value of taking an action in a state with the actual value of taking the action in the state, as observed after having taken it. If the actual value differs from the predicted value, the neural network is adjusted a little bit using the backpropagation algorithm.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.