基于LSTM的Chatbot实例(3) — tensorboard可视化分析LSTM

一、LSTM 计算图　

　　上一篇博文中已经完成了基于tensorflow的chatbot模型建立和训练,并保存训练日志在指定目录。在命令行使用”tensorboard –logdir=‘XXX’”，根据提示打开指定url，即可可视化整个模型计算图及训练过程的参数的变化情况。这里先将选项卡切换在”GRAPHS”栏位，查看整个计算图，整个序列非常的长，下面将分别展开来说

Fig 1-1 chatbot graphs

1.1 Encoder LSTM

上一篇的chatbot代码中我们使用的是tflearn.lstm来构建Encoder

# 开始编码过程，返回的encoder_output_tensor展开成tflearn.regression回归可以识别的形如(?,1,200)的向量(encoder_output_tensor, states) = tflearn.lstm(encoder_inputs, self.word_vec_dim, return_state=True,scope="encoder_lstm")

首先双击放大encoder_lstm之后，第一层框架如下：

Fig 1-2 encoder_lstm
　　右侧标识了encoder_lstm的Inputs和Outputs（这里的Outputs(8)并不是指encoder_lstm的实际输出，因为lstm的输出只有encoder_output_tensor和states，不会有8个输出，这里的Outputs(8)指的是改encoder_lstm的2个输出在后续的那些模块中有被用到/依赖）。tflearn中的lstm源码如下：

# @File    : tflearn.layers.recurrent.pydef lstm(incoming, n_units, activation='tanh', inner_activation='sigmoid',dropout=None, bias=True, weights_init=None, forget_bias=1.0,return_seq=False, return_state=False, initial_state=None,dynamic=False, trainable=True, restore=True, reuse=False,scope=None, name="LSTM"):cell = BasicLSTMCell(n_units, activation=activation,inner_activation=inner_activation,forget_bias=forget_bias, bias=bias,weights_init=weights_init, trainable=trainable,restore=restore, reuse=reuse)x = _rnn_template(incoming, cell=cell, dropout=dropout,return_seq=return_seq, return_state=return_state,initial_state=initial_state, dynamic=dynamic,scope=scope, name=name)return x

　　其中_rnn_template()定义了循环神经网络RNN的模板，BasicLSTMCell只是定义了一个RNN循环中的每个节点Cell。接下来我们先看_rnn_template()是如何循环展开每个Cell，核心代码如下

# @File    : tflearn.layers.recurrent.pydef _rnn_template(incoming, cell, dropout=None, return_seq=False,return_state=False, initial_state=None, dynamic=False,scope=None, reuse=False, name="LSTM"):""" RNN Layer Template. """with tf.variable_scope(scope, default_name=name, values=[incoming],reuse=reuse) as scope:name = scope.name_cell = cell# Apply dropoutif dropout:if type(dropout) in [tuple, list]:in_keep_prob = dropout[0]out_keep_prob = dropout[1]elif isinstance(dropout, float):in_keep_prob, out_keep_prob = dropout, dropoutelse:raise Exception("Invalid dropout type (must be a 2-D tuple of ""float)")cell = DropoutWrapper(cell, in_keep_prob, out_keep_prob)#这里进行了Dropput封装，参考文献【1】inference = incoming# If a tensor given, convert it to a per timestep listif type(inference) not in [list, np.array]:ndim = len(input_shape)assert ndim >= 3, "Input dim should be at least 3."axes = [1, 0] + list(range(2, ndim))inference = tf.transpose(inference, (axes))inference = tf.unstack(inference)outputs, state = _rnn(cell, inference, dtype=tf.float32,initial_state=initial_state, scope=name,sequence_length=sequence_length)

其中tf.transpose和tf.unstack对应Fig 1-2中绿色框中的计算节点，_rnn中定义了循环展开cell的操作，核心代码如下：

# @File    : tensorflow.python.ops.rnn.pydef static_rnn(cell,inputs,initial_state=None,dtype=None,sequence_length=None,scope=None):"""Creates a recurrent neural network specified by RNNCell `cell`."""......for time, input_ in enumerate(inputs):if time > 0:varscope.reuse_variables()# pylint: disable=cell-var-from-loopcall_cell = lambda: cell(input_, state)# pylint: enable=cell-var-from-loopif sequence_length is not None:(output, state) = _rnn_step(time=time,sequence_length=sequence_length,min_sequence_length=min_sequence_length,max_sequence_length=max_sequence_length,zero_output=zero_output,state=state,call_cell=call_cell,state_size=cell.state_size)else:(output, state) = call_cell()outputs.append(output)return (outputs, state)

在tensorboard中，双击encoder_lstm中的encoder_lstm子模块，可以看到RNN循环展开序列如下：

Fig 1-3 encoder_lstm/encoder_lstm

1.２ BasicLSTMCell

　　循环序列的每个子节点都是上面tflearn.lstm()方法中的BasicLSTMCell。tflearn中的BasicLSTMCell是基于论文Recurrent Neural Network Regularization实现的。论文中先是给出了经典的LSTM文献中每个隐藏变量依次单个迭代步骤的矩阵批量迭代形式如下：

Fig 1-4 批量迭代公式
　　其中 i i i是输入门（input gate），f" role="presentation" style="position: relative;">fff是遗忘门（forget gate）， o o o是输出门（output gate），g" role="presentation" style="position: relative;">ggg是新候选值的向量生成器。 hlt−1 h t − 1 l h_{t-1}^{l}是 t−1 t − 1 t-1时刻， l l l层的隐藏变量，htl−1" role="presentation" style="position: relative;">hl−1thtl−1h_t^{l-1}可认为等价于 t t t时刻的外部输入xt" role="presentation" style="position: relative;">xtxtx_t。文中的LSTM模型如下：

Fig 1-5 lstm cell
在tflearn中BasicLSTMCell代码如下：

# @File    : tflearn.layers.recurrent.pyclass BasicLSTMCell(core_rnn_cell.RNNCell):""" TF Basic LSTM recurrent network cell with extra customization params.The implementation is based on: http://arxiv.org/abs/1409.2329.We add forget_bias (default: 1) to the biases of the forget gate in order toreduce the scale of forgetting in the beginning of the training.It does not allow cell clipping, a projection layer, and does notuse peep-hole connections: it is the basic baseline.For advanced models, please use the full LSTMCell that follows."""def __call__(self, inputs, state, scope=None):"""Long short-term memory cell (LSTM)."""with tf.variable_scope(scope or type(self).__name__):  # "BasicLSTMCell"# Parameters of gates are concatenated into one multiply for efficiency.if self._state_is_tuple:c, h = stateelse:c, h = array_ops.split(1, 2, state)concat = _linear([inputs, h], 4 * self._num_units, True, 0.,self.weights_init, self.trainable, self.restore,self.reuse)# i = input_gate, j = new_input, f = forget_gate, o = output_gatei, j, f, o = array_ops.split(value=concat, num_or_size_splits=4,axis=1)# apply batch normalization to inner state and gatesif self.batch_norm == True:i = batch_normalization(i, gamma=0.1, trainable=self.trainable, restore=self.restore, reuse=self.reuse)j = batch_normalization(j, gamma=0.1, trainable=self.trainable, restore=self.restore, reuse=self.reuse)f = batch_normalization(f, gamma=0.1, trainable=self.trainable, restore=self.restore, reuse=self.reuse)o = batch_normalization(o, gamma=0.1, trainable=self.trainable, restore=self.restore, reuse=self.reuse)new_c = (c * self._inner_activation(f + self._forget_bias) +self._inner_activation(i) *self._activation(j))# hidden-to-hidden batch normalizaitonif self.batch_norm == True:batch_norm_new_c = batch_normalization(new_c, gamma=0.1, trainable=self.trainable, restore=self.restore, reuse=self.reuse)new_h = self._activation(batch_norm_new_c) * self._inner_activation(o)else:new_h = self._activation(new_c) * self._inner_activation(o)if self._state_is_tuple:new_state = core_rnn_cell.LSTMStateTuple(new_c, new_h)else:new_state = array_ops.concat([new_c, new_h], 1)# Retrieve RNN Variableswith tf.variable_scope('Linear', reuse=True):self.W = tf.get_variable('Matrix')self.b = tf.get_variable('Bias')return new_h, new_state

在tensorboard中BasicLSTMCell的计算图可视化如下：

Fig 1-5 BasicLSTMCell
　　其中红框1代表 clt c t l c_t^l 的值，红框2代表 hlt h t l h_t^l 的值，都是T+1时刻Cell的输入，所以这里有一个ouput的箭头指向BasicLSTMCell_1（截图的是0时刻的Cell，所以下一时刻是1时刻，Cell命名为BasicLSTMCell_1）。

1.3 Dropout正则化机制

　　正则化主要是为了解决模型过拟合的问题（训练误差和测试误差之间的差距过大），提升模型的泛化能力。常见的正则化方法有：参数范数惩罚、数据集增强、参数共享、提前终止、Bagging和Dropout等方式。其中Bagging的主要思想是分别训练几个不同的模型，然后所有模型表决测试样本的输出，是一种模型平均的方法。可以通过数学手段证明，Bagging至少与它的任何成员模型表现的一样好，并且如果成员的误差是独立的，集成将显著地比其成员模型表现得好。但是Bagging的显著缺点在于需要训练的参数规模翻了好几倍，存储和计算的代价都很大。
　　Dropout提供了一种廉价的Bagging集成近似。Dropout训练的集成包括所有从基本网络中除去非输出单元形成的子网络，如下图所示。与Bagging训练的区别在于：Bagging情况下，所有模型都是独立的，在Dropout情况下，所有模型共享参数，其中每个模型继承父神经网络参数的不同子集。这种参数共享方式使得在有限可用内存下表示指数级数量的模型变得可能。

Fig 1-6 Dropout训练由所有子网络组成的集成
　　Dropout相当于对模型的隐藏单元施加了随机掩码噪音（掩码为0，该隐藏节点被丢弃，掩码为1，该隐藏节点参与计算图），这是一种对输入内容的信息高度智能化、自适应破坏的一种形式，而不是对输入原始值的破坏，因此Dropout是目前广泛使用的正则化方法。论文 Recurrent Neural Network Regularization的主要贡献是引入一种自定义的Dropout处理机制收获了很好的效果，和常规Dropout对比如下。

Fig 1-7 Dropout对比
　　上图中左边是常规的dropout方式，虚线箭头处都是应用dropout操作的地方（也即每个隐藏单元都可以dropout）。右边是论文中给出的dropout方案，只对每个Cell（神经网络的每一层）的输入和输出处应用dropout操作（ L L L层神经网路只会执行L+1" role="presentation" style="position: relative;">L+1L+1L+1次dropout,右图中的2层Cell就会执行如绿色圆框所示的3次dropout操作）。 Fig 1-4所示的公式添加了dropout操作后可表示如下：
　
Fig 1-8 添加Dropout操作的lstm公式
在tensorflow的代码中体现如下：

# @File    : tflearn.layers.recurrent.py
def _rnn_template(incoming, cell, dropout=None, return_seq=False,return_state=False, initial_state=None, dynamic=False,scope=None, reuse=False, name="LSTM"):""" RNN Layer Template. """with tf.variable_scope(scope, default_name=name, values=[incoming],reuse=reuse) as scope:name = scope.name_cell = cell# Apply dropoutif dropout:if type(dropout) in [tuple, list]:in_keep_prob = dropout[0]out_keep_prob = dropout[1]elif isinstance(dropout, float):in_keep_prob, out_keep_prob = dropout, dropoutelse:raise Exception("Invalid dropout type (must be a 2-D tuple of ""float)")cell = DropoutWrapper(cell, in_keep_prob, out_keep_prob)

class DropoutWrapper(core_rnn_cell.RNNCell):"""Operator adding dropout to inputs and outputs of the given cell."""def __call__(self, inputs, state, scope=None):"""Run the cell with the declared dropouts."""is_training = config.get_training_mode()if (not isinstance(self._input_keep_prob, float) orself._input_keep_prob < 1):inputs = tf.cond(is_training,lambda: tf.nn.dropout(inputs,self._input_keep_prob,seed=self._seed),lambda: inputs)output, new_state = self._cell(inputs, state)if (not isinstance(self._output_keep_prob, float) orself._output_keep_prob < 1):output = tf.cond(is_training,lambda: tf.nn.dropout(output,self._output_keep_prob,seed=self._seed),lambda: output)return output, new_state

　　至此，完整介绍了Encoder LSTM在tensorflow中的代码实现及在tensorboard中的可视化计算图。Decoder部分也是类似结构，不再赘述。下一篇将重点介绍在建立了LSTM的Encoder-Decoder模型如何依据均方误差通过SGD算法来模型参数进行优化。

参考文献：
【1】Recurrent Neural Network Regularization
【2】Deep Learning