Playing Smart by Julian Togelius
Author:Julian Togelius
Language: eng
Format: epub, pdf
Tags: AI; algorithm; video games; videogames; Mario; Angry Birds; DOOM; Tomb Raider; World of Warcraft; WOW; Chess; Go; Call of Duty
Publisher: The MIT Press
Published: 2018-11-05T16:00:00+00:00
Trial and Error on Speed
Evolutionary computation can be described as a process of massive trial and error. It seems to be an enormously wasteful process—all those neural nets that are somewhat worse than the best neural nets of each generation are simply thrown away. None of the information they encountered in their brief “lives” is saved. Yet the process of evolution through selection works, both in nature (as we are living proof of) and inside computer programs. But is there another way we could learn from experience to create effective AI, perhaps preserving more information?
The problem of learning to perform a task given only intermittent feedback about how well you’re doing is called the reinforcement learning problem, importing some terminology from behaviorist psychology (the kind where psychologists make rats pull levers and run around in mazes) to computer science. There are essentially two broad approaches to solving these problems. The less common is to use some form of evolutionary algorithm. The more common is to use some form of approximate dynamic programming, such as the Q-learning algorithm.
You can think of it this way: whereas evolutionary computing models the type of learning that takes place across multiple lifetimes, Q-learning (and similar algorithms) models the kind of learning that takes place during a lifetime. Instead of learning based on a single fitness value at the end of an attempt to perform a task (as evolution does), Q-learning can learn from many events as the task is performed. Instead of making random changes to the complete neural network (as happens in evolution), in Q-learning the changes are taken in specific directions in response to positive or negative rewards.
In Q-learning, the neural network takes inputs that represent what the agent “sees,” just like the evolved car control network I described in the previous section. The networks also take inputs describing what action the agent is considering to take; in the car racing domain, it could be steer left, steer right, accelerate, and brake (or some combination). The output is a Q-value, which is an estimate of how good a particular action would be in a particular state (situation). So instead of mapping sensor inputs to actions, the network maps sensor inputs and actions to Q-values. The way this neural network is used to do something, such as driving a car, is that every time it needs to make a decision, it tests all possible actions and makes the one with the highest Q-value in the current state.
Obviously the neural network needs to be trained before it is useful; a network that outputs random Q-values is not going to win any races or solve any other problems, for that matter. The basic idea of training a neural network using Q-learning is to compare the predicted value of taking an action in a state with the actual value of taking the action in the state, as observed after having taken it. If the actual value differs from the predicted value, the neural network is adjusted a little bit using the backpropagation algorithm.
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Computer Vision & Pattern Recognition | Expert Systems |
Intelligence & Semantics | Machine Theory |
Natural Language Processing | Neural Networks |
Algorithms of the Intelligent Web by Haralambos Marmanis;Dmitry Babenko(8527)
Test-Driven Development with Java by Alan Mellor(7437)
Data Augmentation with Python by Duc Haba(7329)
Principles of Data Fabric by Sonia Mezzetta(7075)
Learn Blender Simulations the Right Way by Stephen Pearson(7016)
Microservices with Spring Boot 3 and Spring Cloud by Magnus Larsson(6833)
RPA Solution Architect's Handbook by Sachin Sahgal(6247)
Hadoop in Practice by Alex Holmes(6037)
The Infinite Retina by Robert Scoble Irena Cronin(5949)
Jquery UI in Action : Master the concepts Of Jquery UI: A Step By Step Approach by ANMOL GOYAL(5877)
Big Data Analysis with Python by Ivan Marin(5741)
Life 3.0: Being Human in the Age of Artificial Intelligence by Tegmark Max(5409)
Pretrain Vision and Large Language Models in Python by Emily Webber(4700)
Infrastructure as Code for Beginners by Russ McKendrick(4480)
WordPress Plugin Development Cookbook by Yannick Lefebvre(4211)
Functional Programming in JavaScript by Mantyla Dan(4128)
The Age of Surveillance Capitalism by Shoshana Zuboff(4123)
Embracing Microservices Design by Ovais Mehboob Ahmed Khan Nabil Siddiqui and Timothy Oleson(4002)
Applied Machine Learning for Healthcare and Life Sciences Using AWS by Ujjwal Ratan(3979)
