The Hundred-Page Machine Learning Book by Adriy Burkov
Author:Adriy Burkov
Language: eng
Format: mobi
Published: 2019-07-04T22:24:38+00:00
6.2.2 Recurrent Neural Network
Recurrent neural networks (RNNs) are used to label, classify, or generate sequences. A sequence is a matrix, each row of which is a feature vector and the order of rows matters. To label a sequence is to predict a class for each feature vector in a sequence. To classify a sequence is to predict a class for the entire sequence. To generate a sequence is to output another sequence (of a possibly different length) somehow relevant to the input sequence.
RNNs are often used in text processing because sentences and texts are naturally sequences of either words/punctuation marks or sequences of characters. For the same reason, recurrent neural networks are also used in speech processing.
A recurrent neural network is not feed-forward: it contains loops. The idea is that each unit of recurrent layer has a real-valued state . The state can be seen as the memory of the unit. In RNN, each unit in each layer receives two inputs: a vector of states from the previous layer and the vector of states from this same layer from the previous time step.
To illustrate the idea, let’s consider the first and the second recurrent layers of an RNN. The first (leftmost) layer receives a feature vector as input. The second layer receives the output of the first layer as input.
This situation is schematically depicted in fig. 30 below.
Figure 30: The first two layers of an RNN. The input feature vector is two-dimensional; each layer has two units.
As I said above, each training example is a matrix in which each row is a feature vector. For simplicity, let’s illustrate this matrix as a sequence of vectors , where is the length of the input sequence. If our input example is a text sentence, then feature vector for each represents a word in the sentence at position .
As depicted in fig. 30, in an RNN, the feature vectors from an input example are “read” by the neural network sequentially in the order of the timesteps. The index denotes a timestep. To update the state at each timestep in each unit of each layer we first calculate a linear combination of the input feature vector with the state vector of this same layer from the previous timestep, . The linear combination of two vectors is calculated using two parameter vectors , and a parameter . The value of is then obtained by applying activation function to the result of the linear combination. A typical choice for function is . The output is typically a vector calculated for the whole layer at once. To obtain , we use activation function that takes a vector as input and returns a different vector of the same dimensionality. The function is applied to a linear combination of the state vector values calculated using a parameter matrix and a parameter vector . In classification, a typical choice for is the softmax function:
where
The softmax function is a generalization of the sigmoid function to multidimensional outputs. It has the property that and for all .
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Computer Vision & Pattern Recognition | Expert Systems |
Intelligence & Semantics | Machine Theory |
Natural Language Processing | Neural Networks |
Algorithms of the Intelligent Web by Haralambos Marmanis;Dmitry Babenko(7836)
Hadoop in Practice by Alex Holmes(5650)
Jquery UI in Action : Master the concepts Of Jquery UI: A Step By Step Approach by ANMOL GOYAL(5496)
Life 3.0: Being Human in the Age of Artificial Intelligence by Tegmark Max(4474)
Functional Programming in JavaScript by Mantyla Dan(3712)
The Age of Surveillance Capitalism by Shoshana Zuboff(3398)
Blockchain Basics by Daniel Drescher(2868)
Big Data Analysis with Python by Ivan Marin(2836)
The Rosie Effect by Graeme Simsion(2689)
WordPress Plugin Development Cookbook by Yannick Lefebvre(2524)
Applied Predictive Modeling by Max Kuhn & Kjell Johnson(2460)
Hands-On Machine Learning for Algorithmic Trading by Stefan Jansen(2444)
Dawn of the New Everything by Jaron Lanier(2423)
The Art Of Deception by Kevin Mitnick(2278)
Rapid Viz: A New Method for the Rapid Visualization of Ideas by Kurt Hanks & Larry Belliston(2176)
Human Dynamics Research in Smart and Connected Communities by Shih-Lung Shaw & Daniel Sui(2167)
Once Upon an Algorithm by Martin Erwig(2136)
Test-Driven Development with Java by Alan Mellor(2050)
Building Machine Learning Systems with Python by Richert Willi Coelho Luis Pedro(2048)