Artificial Intelligence by Margaret A. Boden

Artificial Intelligence by Margaret A. Boden

Author:Margaret A. Boden
Language: eng
Format: epub
Publisher: Oxford University Press
Published: 2018-06-23T00:00:00+00:00


Backprop and brains—and deep learning

PDP enthusiasts argue that their networks are more biologically realistic than symbolic AI. It’s true that PDP is inspired by brains, and that some neuroscientists use it to model neural functioning. However, ANNs differ significantly from what lies inside our heads.

One difference between (most) ANNs and brains is back-propagation, or backprop. This is a learning rule—or rather, a general class of learning rules—that’s frequently used in PDP. Anticipated by Paul Werbos in 1974, it was defined more useably by Geoffrey Hinton in the early 1980s. It solves the problem of credit assignment.

This problem arises across all types of AI, especially when the system is continually changing. Given a complex AI system that’s successful, just which parts of it are most responsible for the success? In evolutionary AI, credit is often assigned by the ‘bucket-brigade’ algorithm (see Chapter 5). In PDP systems with deterministic (not stochastic) units, credit is typically assigned by backprop.

The backprop algorithm traces responsibility back from the output layer into the hidden layers, identifying the individual units that need to be adapted. (The weights are updated to minimize prediction errors.) The algorithm needs to know the precise state of the output layer when the network is giving the right answer. (So backprop is supervised learning.) Unit-by-unit comparisons are made between this exemplary output and the output actually obtained from the network. Any difference between an output unit’s activity in the two cases counts as an error.

The algorithm assumes that error in an output unit is due to error(s) in the units connected to it. Working backwards through the system, it attributes a specific amount of error to each unit in the first hidden layer, depending on the connection weight between it and the output unit. Blame is shared between all the hidden units connected to the mistaken output unit. (If a hidden unit is linked to several output units, its mini-blames are summed.) Proportional weight changes are then made to the connections between the hidden layer and the preceding layer.

That layer may be another (and another …) stratum of hidden units. But ultimately it will be the input layer, and the weight changes will stop. This process is iterated until the discrepancies at the output layer are minimized.

For many years, backprop was used only on networks with one hidden layer. Multilayer networks were rare: they are difficult to analyse, and even to experiment with. Recently, however, they have caused huge excitement—and some irresponsible hype—by the advent of deep learning. Here, a system learns structure reaching deep into a domain, as opposed to mere superficial patterns. In other words, it discovers a multilevel knowledge representation, not a single-level one.

Deep learning is exciting because it promises to enable ANNs, at last, to deal with hierarchy. Since the early 1980s, connectionists such as Hinton and Jeff Elman had struggled to represent hierarchy—by combining local/distributed representation, or by defining recurrent nets. (Recurrent nets, in effect, perform as a sequence of discrete steps. Recent versions, using deep learning, can sometimes predict the next word in a sentence, or even the next ‘thought’ in a paragraph.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.