Deep Learning by John D. Kelleher
Author:John D. Kelleher [Kelleher, John]
Language: eng
Format: epub, pdf
Tags: Deep Learning; Neural Networks; Artificial Intelligence; Machine Learning; Big Data; Backpropagation; Convolutional Neural Network; Recurrent Neural Network; Long Short-Term Memory
Publisher: MIT Press
Published: 2019-08-16T00:00:00+00:00
Layer-Wise Pretraining Using Autoencoders
In layer-wise pretraining, the initial autoencoder learns an encoding for the raw inputs to the network. Once this encoding has been learned, the units in the hidden encoding layer are fixed, and the output (decoding) layer is thrown away. Then a second autoencoder is trained—but this autoencoder is trained to reconstruct the representation of the data generated by passing it through the encoding layer of the initial autoencoder. In effect, this second autoencoder is stacked on top of the encoding layer of the first autoencoder. This stacking of encoding layers is considered to be a greedy process because each encoding layer is optimized independently of the later layers; in other words, each autoencoder focuses on finding the best solution for its immediate task (learning a useful encoding for the data it must reconstruct) rather than trying to find a solution to the overall problem for the network.
Once a sufficient number8 of encoding layers have been trained, a tuning phase can be applied. In the tuning phase, a final network layer is trained to predict the target output for the network. Unlike the pretraining of the earlier layers of the network, the target output for the final layer is different from the input vector and is specified in the training dataset. The simplest tuning is where the pretrained layers are kept frozen (i.e., the weights in the pretrained layers don’t change during the tuning); however, it is also feasible to train the entire network during the tuning phase. If the entire network is trained during tuning, then the layer-wise pretraining is best understood as finding useful initial weights for the earlier layers in the network. Also, it is not necessary that the final prediction model that is trained during tuning be a neural network. It is quite possible to take the representations of the data generated by the layer-wise pretraining and use it as the input representation for a completely different type of machine learning algorithm, for example, a support vector machine or a nearest neighbor algorithm. This scenario is a very transparent example of how neural networks learn useful representations of data prior to the final prediction task being learned. Strictly speaking, the term pretraining describes only the layer-wise training of the autoencoders; however, the term is often used to refer to both the layer-wise training stage and the tuning stage of the model.
Figure 4.5 shows the stages in layer-wise pretraining. The figure on the left illustrates the training of the initial autoencoder where an encoding layer (the black circles) of three units is attempting to learn a useful representation for the task of reconstructing an input vector of length 4. The figure in the middle of figure 4.5 shows the training of a second autoencoder stacked on top of the encoding layer of the first autoencoder. In this autoencoder, a hidden layer of two units is attempting to learn an encoding for an input vector of length 3 (which in turn is an encoding of a vector of length 4).
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Computer Vision & Pattern Recognition | Expert Systems |
Intelligence & Semantics | Machine Theory |
Natural Language Processing | Neural Networks |
Algorithms of the Intelligent Web by Haralambos Marmanis;Dmitry Babenko(7852)
Hadoop in Practice by Alex Holmes(5661)
Jquery UI in Action : Master the concepts Of Jquery UI: A Step By Step Approach by ANMOL GOYAL(5514)
Life 3.0: Being Human in the Age of Artificial Intelligence by Tegmark Max(4509)
Functional Programming in JavaScript by Mantyla Dan(3723)
The Age of Surveillance Capitalism by Shoshana Zuboff(3423)
Big Data Analysis with Python by Ivan Marin(3014)
Blockchain Basics by Daniel Drescher(2891)
The Rosie Effect by Graeme Simsion(2708)
WordPress Plugin Development Cookbook by Yannick Lefebvre(2602)
Hands-On Machine Learning for Algorithmic Trading by Stefan Jansen(2519)
Applied Predictive Modeling by Max Kuhn & Kjell Johnson(2480)
Dawn of the New Everything by Jaron Lanier(2438)
Test-Driven Development with Java by Alan Mellor(2390)
The Art Of Deception by Kevin Mitnick(2298)
Data Augmentation with Python by Duc Haba(2234)
Rapid Viz: A New Method for the Rapid Visualization of Ideas by Kurt Hanks & Larry Belliston(2195)
The Infinite Retina by Robert Scoble Irena Cronin(2181)
Human Dynamics Research in Smart and Connected Communities by Shih-Lung Shaw & Daniel Sui(2178)