Deep Belief Nets in C++ and CUDA C: Volume 1 by Timothy Masters

Deep Belief Nets in C++ and CUDA C: Volume 1 by Timothy Masters

Author:Timothy Masters
Language: eng
Format: epub
Publisher: Apress, Berkeley, CA


The following notation will be used:

W   Weight matrix, a column for each visible neuron and a row for each hidden neuron

b   Column vector of visible neuron biases

c   Column vector of hidden neuron biases

K   The number of Monte Carlo iterations to perform

x   The training case being processed (column vector)

q Data   Vector of probabilities under the data distribution that each hidden neuron will be one (as opposed to zero)

h Data   Hidden neuron activation vector under the data distribution, zero or one

p Model  Vector of reconstruction probabilities under the model distribution that each visible neuron will be one (as opposed to zero)

v Model  Reconstructed visible neuron activation vector, zero or one

q Model  Vector of probabilities under the model distribution that each hidden neuron will be one (as opposed to zero)

h Model  Hidden neuron activation vector under the model distribution, zero or one

It is to be understood that p is a vector of length equal to the number of inputs (visible neurons), and it contains probabilities computed by Equation 3-2 or 3-4. Each element of v is individually sampled from 0/1 according to these probabilities. The hidden neuron probabilities and activations are defined similarly.

vData = x

qData = f (c + WvData) Equation 3-3

Optionally compute the reconstruction error using the slow, accurate method.

q Model = q Data MC chain loop below initializes from data

k = 0

while k < K K must be at least 1

Sample hModel from qModel This sampling is critical; must not use q

pModel = f (b + W ′hModel) Equation 3-4

If k=0, optionally compute the reconstruction error using the fast method.

if mean field

qModel = f (c + WpModel)

else

vModel is sampled from pModel

qModel = f (c + WvModel)

k = k+1

end while

if mean field

Visible bias gradient = p Model − v Data

Hidden bias gradient = q Model − q Data

Weight gradient = q Model p ′ Model − q Data v ′ Data This product is a matrix

else

Visible bias gradient = v Model − v Data

Sample h Data from q Data

Hidden bias gradient = q Model − h Data

Weight gradient = q Model v ′ Model − h Data v ′ Data

A few things should be noted about this algorithm. First, the weight gradient is a matrix that, like W, has a row for each hidden neuron and a column for each visible neuron. The products given in Equation 3-12 are efficiently represented in the algorithm by showing them as the product of a column vector for hidden neurons times a row vector for visible neurons.

There are two different places in the algorithm in which one can compute the reconstruction error. This error has no use in the training algorithm itself, but it is nice to display it for the user. Regardless of which place we choose, Equations 3-3 and 3-4 are used to jump from the visible layer to the hidden layer and then bounce back to the visible layer. The reconstruction error will compare the original data with the reconstructed data. The only question is whether we use the raw probabilities from these equations or samples based on the probabilities.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.