深度学习 —— 受限玻尔曼机 RBM

能量基础模型（EBM)

能量基础模型为每一个感兴趣的变量设置分配一个标量能量。学习目的是改变能量函数以使它具有期待属性。例如我们希望通过理想或可行的设置获得低能量。能量基础的概率模型定义了能量函数的概率分布。

均一化因子Z通过模拟物理系统称为分割函数。

能量基础模型可以通过在训练数据的实证负指数相似上（随机）梯度下降习得。同逻辑回归，我们首先定义指数相似然后定义损失为负指数相似。

使用随机梯度

隐藏单元的EBMs

很多情况下，我们没有完全观测到样本x，或者我们希望引入一些非观测性变量以提升模型能力。所以我们考虑观测部分（仍记为x）和隐藏部分h。

这种情况下，为类似的来写方程，我们引入（受物理学启发）自由能量。

这样我们获得

数据负指数相似梯度有了一个有意思的形式

注意，上述梯度有两个项，称为正数阶段和负数阶段。正数和负数不是指方程中的符号，而是反应它们被模型定义的对概率密度的影响。第一项提高训练数据的概率（通过降低相应的自由能量），而第二想降低模型产生样本的概率。

通常很难确定梯度，因为涉及计算

这等同于对所有输入x（在模型P分布）的可能设置进行期望值计算。

要想计算它，首先使用固定数量的模型样本来估计期望值。用来估计负数阶段梯度的样本称为负数分子，记作N，梯度可写作

理想的我们在这里使用蒙特卡洛抽样。上述我们基本有了一个可行的随机函数来学习EBM。唯一缺少的是如何提取负数分子。大部分统计学著作围绕取样方法展开，马可夫链蒙特卡洛对受限玻尔曼机，一种特殊的EBM尤为适用。

受限玻尔曼机

玻尔曼机（BM) 是马可夫随机场（MRF)的一种特殊形式，即在自由参数中能量函数呈线性。为使他们能表达复杂分布（即从有限的参数设置到非参），我们考虑一些变量始终未被观察到（隐藏）。通过有更多隐藏变量（隐藏单元）我们可以提升玻尔曼机的建模能力。受限玻尔曼机更进一步，去除了可见与可见，隐藏与隐藏之间的连接，图示如下

能量函数定义为

W是连接隐藏和可见层的权重，b, c分别是可见和隐藏层的偏置。

这将自由能量公式转化为

因为RBM的特殊结构，可见和隐藏单元有条件的相互独立，我们可以写作

使用二分单元更新公式

结合公式我们得到下列指数相似梯度

受限玻尔曼机的取样

p(x)的样本可以通过运行马可夫链直到聚合获得，使用吉布斯取样作为过渡。

对于受限玻尔曼机来说，S由可见和隐藏单元组成。但是由于他们有条件独立，可以使用块吉布斯取样。在这里，给定隐藏单元值固定，可见单元同时取样。同样，给定可见单元，隐藏单元同时取样。

理论上，学习过程中每一个参数更新要求运行这一链直到聚合。毋庸置言，这样做代价太高。因此为了在学习过程中更有效的取样，为RBMs设计了一系列算法。

对比发散（CD-k）

对比发散使用两个技巧来加快取样过程：

既然我们最终希望（数据的实际分布），我们使用训练样本来初始化马可夫链（从接近p的分布，这样该链已然接近聚合到最终分布p）。

CD不等链聚合。只在k步吉布斯取样后取样。实践中，k=1出人意料的有效。

持续CD

持续CD使用另一种估算取样。它依赖于单个马可夫链，具有持续状态（不会为每一个观察到的样本重启）。对每一个参数更新，我们简单运行k步来提取新样本。链状态为后续更新保留。

普遍直觉是如果参数更新与链融合率相比是如此之小，马可夫链应该可以跟上模型中的变化。

实现

我们构造一个RBM类。网络参数可以通过构造器初始化或者声明传递。这种方式在RBM构建深度网络时很有用，在那里权重矩阵和隐藏层偏差和相应的MLP网络的sigmoid层共享。

class RBM(object):"""Restricted Boltzmann Machine (RBM)  """def __init__(self,input=None,n_visible=784,n_hidden=500,W=None,hbias=None,vbias=None,numpy_rng=None,theano_rng=None):"""RBM constructor. Defines the parameters of the model along withbasic operations for inferring hidden from visible (and vice-versa),as well as for performing CD updates.:param input: None for standalone RBMs or symbolic variable if RBM ispart of a larger graph.:param n_visible: number of visible units:param n_hidden: number of hidden units:param W: None for standalone RBMs or symbolic variable pointing to ashared weight matrix in case RBM is part of a DBN network; in a DBN,the weights are shared between RBMs and layers of a MLP:param hbias: None for standalone RBMs or symbolic variable pointingto a shared hidden units bias vector in case RBM is part of adifferent network:param vbias: None for standalone RBMs or a symbolic variablepointing to a shared visible units bias"""self.n_visible = n_visibleself.n_hidden = n_hiddenif numpy_rng is None:# create a number generatornumpy_rng = numpy.random.RandomState(1234)if theano_rng is None:theano_rng = RandomStreams(numpy_rng.randint(2 ** 30))if W is None:# W is initialized with `initial_W` which is uniformely# sampled from -4*sqrt(6./(n_visible+n_hidden)) and# 4*sqrt(6./(n_hidden+n_visible)) the output of uniform if# converted using asarray to dtype theano.config.floatX so# that the code is runable on GPUinitial_W = numpy.asarray(numpy_rng.uniform(low=-4 * numpy.sqrt(6. / (n_hidden + n_visible)),high=4 * numpy.sqrt(6. / (n_hidden + n_visible)),size=(n_visible, n_hidden)),dtype=theano.config.floatX)# theano shared variables for weights and biasesW = theano.shared(value=initial_W, name='W', borrow=True)if hbias is None:# create shared variable for hidden units biashbias = theano.shared(value=numpy.zeros(n_hidden,dtype=theano.config.floatX),name='hbias',borrow=True)if vbias is None:# create shared variable for visible units biasvbias = theano.shared(value=numpy.zeros(n_visible,dtype=theano.config.floatX),name='vbias',borrow=True)# initialize input layer for standalone RBM or layer0 of DBNself.input = inputif not input:self.input = T.matrix('input')self.W = Wself.hbias = hbiasself.vbias = vbiasself.theano_rng = theano_rng# **** WARNING: It is not a good idea to put things in this list# other than shared variables created in this function.self.params = [self.W, self.hbias, self.vbias]

接下来定义构建象征图的函数

   def propup(self, vis):'''This function propagates the visible units activation upwards tothe hidden unitsNote that we return also the pre-sigmoid activation of thelayer. As it will turn out later, due to how Theano deals withoptimizations, this symbolic variable will be needed to writedown a more stable computational graph (see details in thereconstruction cost function)'''pre_sigmoid_activation = T.dot(vis, self.W) + self.hbiasreturn [pre_sigmoid_activation, T.nnet.sigmoid(pre_sigmoid_activation)]def sample_h_given_v(self, v0_sample):''' This function infers state of hidden units given visible units '''# compute the activation of the hidden units given a sample of# the visiblespre_sigmoid_h1, h1_mean = self.propup(v0_sample)# get a sample of the hiddens given their activation# Note that theano_rng.binomial returns a symbolic sample of dtype# int64 by default. If we want to keep our computations in floatX# for the GPU we need to specify to return the dtype floatXh1_sample = self.theano_rng.binomial(size=h1_mean.shape,n=1, p=h1_mean,dtype=theano.config.floatX)return [pre_sigmoid_h1, h1_mean, h1_sample]def propdown(self, hid):'''This function propagates the hidden units activation downwards tothe visible unitsNote that we return also the pre_sigmoid_activation of thelayer. As it will turn out later, due to how Theano deals withoptimizations, this symbolic variable will be needed to writedown a more stable computational graph (see details in thereconstruction cost function)'''pre_sigmoid_activation = T.dot(hid, self.W.T) + self.vbiasreturn [pre_sigmoid_activation, T.nnet.sigmoid(pre_sigmoid_activation)] def sample_v_given_h(self, h0_sample):''' This function infers state of visible units given hidden units '''# compute the activation of the visible given the hidden samplepre_sigmoid_v1, v1_mean = self.propdown(h0_sample)# get a sample of the visible given their activation# Note that theano_rng.binomial returns a symbolic sample of dtype# int64 by default. If we want to keep our computations in floatX# for the GPU we need to specify to return the dtype floatXv1_sample = self.theano_rng.binomial(size=v1_mean.shape,n=1, p=v1_mean,dtype=theano.config.floatX)return [pre_sigmoid_v1, v1_mean, v1_sample]

然后使用这些函数来定义吉布斯取样，我们定义两个函数：

gibbs_vhv 从可见单元中进行单步吉布斯取样，这在从RBM取样中将很有用。

gibbs_hvh 从隐藏单元中进行单步吉布斯取样，这在执行CD或PCD更新时会很有用。

  def gibbs_hvh(self, h0_sample):''' This function implements one step of Gibbs sampling,starting from the hidden state'''pre_sigmoid_v1, v1_mean, v1_sample = self.sample_v_given_h(h0_sample)pre_sigmoid_h1, h1_mean, h1_sample = self.sample_h_given_v(v1_sample)return [pre_sigmoid_v1, v1_mean, v1_sample,pre_sigmoid_h1, h1_mean, h1_sample]def gibbs_vhv(self, v0_sample):''' This function implements one step of Gibbs sampling,starting from the visible state'''pre_sigmoid_h1, h1_mean, h1_sample = self.sample_h_given_v(v0_sample)pre_sigmoid_v1, v1_mean, v1_sample = self.sample_v_given_h(h1_sample)return [pre_sigmoid_h1, h1_mean, h1_sample,pre_sigmoid_v1, v1_mean, v1_sample]

注意我们也返回pre-sigmoid激活。理解为什么这样做首先要了解一点Theano的工作方式。当你编译一个Theano函数时，传递输入的计算图被优化以提高速度和稳定性。这通过改变一些图的区域实现。一种优化以softplus来表示log(sigmoid(x))。我们需要这种优化来计算交叉熵，因为sigmoid对于大于30的数字转化为1，对于小于-30的数字为0，这回迫使Theano计算log(0)，从而使我们得到-inf或者NaN成本。如果使用softplus，我们避免这种情况。这种优化一般可行，但有特例。sigmoid在scan操作内运行，而log在之外。这样Theano只能看到log(scan(...))而不是log(sigmoid(...))从而无法优化。我们也无法替代scan里的sigmoid，因为只需要在最后一步进行这样的操作。因此最简单有效的方法是使pre-sigmoid激活也作为scan的一个输出，然后执行log和sigmoid，这样Theano可以捕捉并优化它们。

同时还有函数计算模型的自由能量，用来计算参数的梯度，这里我们同样需要返回pre-sigmoid

 def free_energy(self, v_sample):''' Function to compute the free energy '''wx_b = T.dot(v_sample, self.W) + self.hbiasvbias_term = T.dot(v_sample, self.vbias)hidden_term = T.sum(T.log(1 + T.exp(wx_b)), axis=1)return -hidden_term - vbias_term

然后加入get_cost_updates方法，生成CD-k和PCD-k的象征梯度

 def get_cost_updates(self, lr=0.1, persistent=None, k=1):"""This functions implements one step of CD-k or PCD-k:param lr: learning rate used to train the RBM:param persistent: None for CD. For PCD, shared variablecontaining old state of Gibbs chain. This must be a sharedvariable of size (batch size, number of hidden units).:param k: number of Gibbs steps to do in CD-k/PCD-kReturns a proxy for the cost and the updates dictionary. Thedictionary contains the update rules for weights and biases butalso an update of the shared variable used to store the persistentchain, if one is used."""# compute positive phasepre_sigmoid_ph, ph_mean, ph_sample = self.sample_h_given_v(self.input)# decide how to initialize persistent chain:# for CD, we use the newly generate hidden sample# for PCD, we initialize from the old state of the chainif persistent is None:chain_start = ph_sampleelse:chain_start = persistent

注意get_cost_updates接受persistent声明变量。这使我们可以使用相同代码来执行CD和PCD。使用PCD时，persistent指代共享变量，包含了此前迭代的吉布斯链状态。

如果persistent是None，我们使用正数阶段生成的隐藏样本来初始化吉布斯链。一旦我们建立了链的起点，我们可以计算吉布斯链终点的样本，我们需要该样本来计算梯度。

   # perform actual negative phase# in order to implement CD-k/PCD-k we need to scan over the# function that implements one gibbs step k times.# Read Theano tutorial on scan for more information :# http://deeplearning.net/software/theano/library/scan.html# the scan will return the entire Gibbs chain([pre_sigmoid_nvs,nv_means,nv_samples,pre_sigmoid_nhs,nh_means,nh_samples],updates) = theano.scan(self.gibbs_hvh,# the None are place holders, saying that# chain_start is the initial state corresponding to the# 6th outputoutputs_info=[None, None, None, None, None, chain_start],n_steps=k,name="gibbs_hvh")

一旦我们生成了链，我们使用最终的样本来计算负数阶段的自由能量。注意chain_end是一个象征Theano变量，体现为模型参数，如果我们简单使用T.grad，函数会尝试遍历吉布斯链以获得梯度。这不是我们想要的，因此我们要标注T.grad和chain_end为常量，我们使用声明T.grad的consider_constant来实现。

 # determine gradients on RBM parameters# note that we only need the sample at the end of the chainchain_end = nv_samples[-1]cost = T.mean(self.free_energy(self.input)) - T.mean(self.free_energy(chain_end))# We must not compute the gradient through the gibbs samplinggparams = T.grad(cost, self.params, consider_constant=[chain_end])

最后，我们把scan（包含theano_rng随机态的更新规则）返回的参数更新加入更新字典。在PCD中，应该同时更新包含吉布斯链状态的分享变量。

  # constructs the update dictionaryfor gparam, param in zip(gparams, self.params):# make sure that the learning rate is of the right dtypeupdates[param] = param - gparam * T.cast(lr,dtype=theano.config.floatX)if persistent:# Note that this works only if persistent is a shared variableupdates[persistent] = nh_samples[-1]# pseudo-likelihood is a better proxy for PCDmonitoring_cost = self.get_pseudo_likelihood_cost(updates)else:# reconstruction cross-entropy is a better proxy for CDmonitoring_cost = self.get_reconstruction_cost(updates,pre_sigmoid_nvs[-1])return monitoring_cost, updates

追踪进程

由于分割函数Z，RBMs训练起来很困难。我们无法在训练时估计指数相似log(P(x))，因此没有办法直接衡量以选择最优的超参数。

这里仅提供一些想法。

检查负样本

训练过程中得到的负样本可以图形化。随着训练进行，我们知道模型定义的RBM与实际的Ptrain(x)分布越来越接近。负样本因此应该与训练集中的样本相似。明显的不良超参数可以以这种方式舍弃。

图形化检视过滤器

模型学习到的过滤器可以图形化。相当于勾画每个单元权重的灰图（在重新调整为平方矩阵）。过滤器应该找出数据中的突出特征。虽然对于任意数据集很难说特征应该怎样，对于MNIST的训练过滤器一般能捕捉到笔画，而对于自然图像的训练如果结合稀疏标准则过滤器类似于Gabor。

相似性的代理

其他更能追踪的函数可以作为相似性的代理。当使用PCD训练RBM时，可以使用伪相似作为代理。伪相似（PL）计算代价小，因为它假设所有部分互相独立

这里x-i表示除了i的所有x。log-PL因此是每个Xi指数概率之和，给定其他所有部分状态。对于MNIST，涉及加总所有784输入维度，依然昂贵。因此我们使用log-PL的随机相似

期望由索引i的一致随机选择，N是可见单元的数量。为进行二进制操作，我们进一步引进指代x，i比特互反(1->0, 0->1)。RBM的log-PL写成

我们因此在RBM类的get_cost_updates函数返回这个成本以及RBM更新。注意我们调整更新字典来递增i的索引。这会导致i遍历所有可能的值{0,1,...,N}。

注意对于输入和重构的（同降噪自动编码机）CD训练交叉熵成本比伪指数相似更可靠。以下是我们计算伪相似的代码：

def get_pseudo_likelihood_cost(self, updates):"""Stochastic approximation to the pseudo-likelihood"""# index of bit i in expression p(x_i | x_{\i})bit_i_idx = theano.shared(value=0, name='bit_i_idx')# binarize the input image by rounding to nearest integerxi = T.round(self.input)# calculate free energy for the given bit configurationfe_xi = self.free_energy(xi)# flip bit x_i of matrix xi and preserve all other bits x_{\i}# Equivalent to xi[:,bit_i_idx] = 1-xi[:, bit_i_idx], but assigns# the result to xi_flip, instead of working in place on xi.xi_flip = T.set_subtensor(xi[:, bit_i_idx], 1 - xi[:, bit_i_idx])# calculate free energy with bit flippedfe_xi_flip = self.free_energy(xi_flip)# equivalent to e^(-FE(x_i)) / (e^(-FE(x_i)) + e^(-FE(x_{\i})))cost = T.mean(self.n_visible * T.log(T.nnet.sigmoid(fe_xi_flip -fe_xi)))# increment bit_i_idx % number as part of updatesupdates[bit_i_idx] = (bit_i_idx + 1) % self.n_visiblereturn cost

主循环

现在我们具备了训练网络所有的条件。

在我们开始训练之前，应熟悉tile_raster_images，见Miscellaneous - DeepLearning 0.1 documentation

因为RBMs是生成模型，我们对取样以及图形化这些样本更改兴趣。我们也希望图形化习得的过滤器（权重），以了解RBMs的工作。但是记住我们忽略了偏差并且将权重乘以常数以转化到了0和1之前。

我们开始训练RBM并保存/画出每次训练后的过滤器，我们使用PCD训练因为它展示了更好的生成模型。

  # it is ok for a theano function to have no output# the purpose of train_rbm is solely to update the RBM parameterstrain_rbm = theano.function([index],cost,updates=updates,givens={x: train_set_x[index * batch_size: (index + 1) * batch_size]},name='train_rbm')plotting_time = 0.start_time = timeit.default_timer()# go through training epochsfor epoch in range(training_epochs):# go through the training setmean_cost = []for batch_index in range(n_train_batches):mean_cost += [train_rbm(batch_index)]print('Training epoch %d, cost is ' % epoch, numpy.mean(mean_cost))# Plot filters after each training epochplotting_start = timeit.default_timer()# Construct image from the weight matriximage = Image.fromarray(tile_raster_images(X=rbm.W.get_value(borrow=True).T,img_shape=(28, 28),tile_shape=(10, 10),tile_spacing=(1, 1)))image.save('filters_at_epoch_%i.png' % epoch)plotting_stop = timeit.default_timer()plotting_time += (plotting_stop - plotting_start)end_time = timeit.default_timer()pretraining_time = (end_time - start_time) - plotting_timeprint ('Training took %f minutes' % (pretraining_time / 60.))

训练完RBM之后，我们可以使用gibbs_vhv来实现取样所需的吉布斯链。我们从测试样本开始初始化吉布斯链（虽然我们也可以从训练集中选择）以提高聚合速度和避免随机初始化的问题。我们使用Theano的scan来执行1000步，然后作图。

 ##################################     Sampling from the RBM     ################################### find out the number of test samplesnumber_of_test_samples = test_set_x.get_value(borrow=True).shape[0]# pick random test examples, with which to initialize the persistent chaintest_idx = rng.randint(number_of_test_samples - n_chains)persistent_vis_chain = theano.shared(numpy.asarray(test_set_x.get_value(borrow=True)[test_idx:test_idx + n_chains],dtype=theano.config.floatX))

然后我们建立20个并行的持续链来获得样本。我们编译一个Theano函数来执行吉布斯步骤并用新的可见样本来更新持续链的状态。我们迭代一个较大步数该函数，并在每1000步图形化样本。

 plot_every = 1000# define one step of Gibbs sampling (mf = mean-field) define a# function that does `plot_every` steps before returning the# sample for plotting([presig_hids,hid_mfs,hid_samples,presig_vis,vis_mfs,vis_samples],updates) = theano.scan(rbm.gibbs_vhv,outputs_info=[None, None, None, None, None, persistent_vis_chain],n_steps=plot_every,name="gibbs_vhv")# add to updates the shared variable that takes care of our persistent# chain :.updates.update({persistent_vis_chain: vis_samples[-1]})# construct the function that implements our persistent chain.# we generate the "mean field" activations for plotting and the actual# samples for reinitializing the state of our persistent chainsample_fn = theano.function([],[vis_mfs[-1],vis_samples[-1]],updates=updates,name='sample_fn')# create a space to store the image for plotting ( we need to leave# room for the tile_spacing as well)image_data = numpy.zeros((29 * n_samples + 1, 29 * n_chains - 1),dtype='uint8')for idx in range(n_samples):# generate `plot_every` intermediate samples that we discard,# because successive samples in the chain are too correlatedvis_mf, vis_sample = sample_fn()print(' ... plotting sample %d' % idx)image_data[29 * idx:29 * idx + 28, :] = tile_raster_images(X=vis_mf,img_shape=(28, 28),tile_shape=(1, n_chains),tile_spacing=(1, 1))# construct imageimage = Image.fromarray(image_data)image.save('samples.png')

结果

我们使用PCD-15，学习速率0.1，批次规模20，运行15次。在Intel Xeon E5430 @2.66GHz CPU上单线程GotoBLAS，模型使用122.466分钟。

输入如下

... loading data
Training epoch 0, cost is  -90.6507246003
Training epoch 1, cost is  -81.235857373
Training epoch 2, cost is  -74.9120966945
Training epoch 3, cost is  -73.0213216101
Training epoch 4, cost is  -68.4098570497
Training epoch 5, cost is  -63.2693021647
Training epoch 6, cost is  -65.99578971
Training epoch 7, cost is  -68.1236650015
Training epoch 8, cost is  -68.3207365087
Training epoch 9, cost is  -64.2949797113
Training epoch 10, cost is  -61.5194867893
Training epoch 11, cost is  -61.6539369402
Training epoch 12, cost is  -63.5465278086
Training epoch 13, cost is  -63.3787093527
Training epoch 14, cost is  -62.755739271
Training took 122.466000 minutes... plotting sample  0... plotting sample  1... plotting sample  2... plotting sample  3... plotting sample  4... plotting sample  5... plotting sample  6... plotting sample  7... plotting sample  8... plotting sample  9

15次以后过滤器图形如下

下列是RBM训练后生成的样本，每一行代表负分子的微批次（来自吉布斯链的独立样本），两行间执行1000步吉布斯取样。

深度学习 —— 受限玻尔曼机 RBM相关推荐

【深度学习】波尔次曼机,受限波尔兹曼机,DBN详解
神经网络自20世纪50年代发展起来后,因其良好的非线性能力.泛化能力而备受关注.然而,传统的神经网络仍存在一些局限,在上个世纪90年代陷入衰落,主要有以下几个原因: 1.传统的神经网络一般都是单隐层, ...
【零散知识】受限波兹曼机（restricted Boltzmann machine，RBM）和深度置信网络（deep belief network，DBN）
前言: { 最近一直在想要不要去线下的英语学习机构学英语 (本人的英语口语能力实在是低).如果我想完成今年的年度计划,那么今年就没时间学英语了. 这次的内容是之前落下的深度置信网络(deep beli ...
RBM受限波兹曼机在特征学习上的使用
'''受限波兹曼机在特征学习上的使用 ''' import numpy as np import matplotlib.pyplot as plt %matplotlib inline from sk ...
深度学习 --- 受限玻尔兹曼机RBM(MCMC接受率详解）
上节我们提到了,MCMC的接受率问题,在<LDA数学八卦>中作者直接给出了接受率的问题,没讲明原因,感觉很突兀,我彻底搞明白了,只是这个体系比较庞大,说来话长,需要和前面讲过的内容串在一起 ...
深度学习 --- 受限玻尔兹曼机详解(RBM)
本节终于来到了重头戏受限玻尔兹曼机,为了能深入理解本节,我们深入讲了很多基础知识,这些基础知识很重要,是理解本节的基础,同时也是你学习其他算法的基础如强化学习.自然语言处理等.本节的安排是先对比一下受 ...
基于深度学习的人脸识别闸机开发（基于飞桨PaddlePaddle）
目录一.概述 1.1 人脸识别背景 1.2 实现 1.2.1 算法说明 1.2.2 环境设置 1.2.3 实现思路二.示例脚本 2.1 安装PaddlePaddle和PLSC 2.2 下载人脸检测 ...
模型涨点的思路，深度学习训练的tricks-计算机视觉
一项机器学习任务时常常有以下的几个重要步骤, 首先是数据的预处理,其中重要的步骤包括数据格式的统一.异常数据的消除和必要的数据变换: 然后划分训练集.验证集.测试集,常见的方法包括:按比例随机选取,K ...
深度学习基础（四）—— RBM（受限波尔滋曼机）
如果神经网络的初值选取的不好的话,往往会陷入局部最小值.实际应用表明,如果把 RBM 训练得到的权值矩阵和 bias 作为 BP 神经网络的初始值,得到的结果会非常好.其实,RBM 最主要的用途还是用 ...
深度学习 --- 受限玻尔兹曼机RBM（MCMC和Gibbs采样）
上一节我们详细的讲解了马尔科夫过程和马尔科夫链,提及了转移矩阵P,马尔科夫的平稳性(遍历性),需要好好理解马尔科夫的平稳性,因为本节将根据马尔科夫的平稳性进行讲解,同时也介绍了采样的原理和过程.好,到 ...

深度学习 —— 受限玻尔曼机 RBM

深度学习 —— 受限玻尔曼机 RBM相关推荐

最新文章

热门文章