CS224n课程Assignment3参考答案

$Assignment#3−solutionByJonariguezAssignment\#3 -solution\quad By\ Jonariguez$

所有的代码题目对应的代码已上传至github/CS224n/Jonariguez

所有的代码题目对应的代码可查看对应文件夹Assignment3_Code下的.py文件

解：
ii) 只用词本身的话有点像基于统计的方法，面对低频词或者未统计词模型表现不好，有时候词也有二义性，无法确定是否为实体或者是什么实体。
iii) 上下文、词性等。

解：
i) 推算所有变量的形状：

$x(t)∈R1×Vx^{(t)}\in \mathbb{R}^{1\times V}$
$x(t)L∈R1×Dx^{(t)}L\in \mathbb{R}^{1\times D}$
$e(t)∈R1×(2w+1)De^{(t)}\in \mathbb{R}^{1\times (2w+1)D}$
$h(t)∈R1×Hh^{(t)}\in \mathbb{R}^{1\times H}$
$\in \mathbb{R}^{(2w+1)D\times H}$
$y^(t)∈R1×C\hat{y}^{(t)}\in \mathbb{R}^{1\times C}$
$U∈RH×CU\in \mathbb{R}^{H\times C}$
$b1∈R1×Hb_1\in \mathbb{R}^{1\times H}$
$b2∈R1×Cb_2\in \mathbb{R}^{1\times C}$

ii) 对于1个word的复杂度为：

$e(t)=[x(t−w)L,...,x(t)L,...,x(t+w)L]→O(wV)e^{(t)}=[x^{(t-w)}L,...,x^{(t)}L,...,x^{(t+w)}L]\rightarrow O(wV)$
$h(t)=ReLU(e(t)W+b1)→O(wDH)h^{(t)}=ReLU(e^{(t)}W+b_1)\rightarrow O(wDH)$
$y^(t)=softmax(h(t)U+b2)→O(HC)\hat{y}^{(t)}=softmax(h^{(t)}U+b_2)\rightarrow O(HC)$
$J=CD(y(t),y^(t))=−∑iyi(t)log(y^i(t))→O(C)J=CD(y^{(t)},\hat{y}^{(t)})=-\sum_{i}{y_i^{(t)}log(\hat{y}_i^{(t)})} \rightarrow O(C)$

所以复杂度为： $O (w V + w D H + H C)$
长度为T的句子复杂度为： $O (T (w V + w D H + H C))$

解：

在python3中利用from io import StringIO来导StringIO。

解：

i) ① window-based model中的 $W∈R(2w+1)D×HW\in \mathbb{R}^{(2w+1)D\times H}$ ，而RNN中的 $Wx∈RD×HW_x\in \mathbb{R}^{D\times H}$ ;

② RNN多了个 $Wh∈RH×HW_h\in \mathbb{R}^{H\times H}$ 。

ii) $O((D+H)⋅H⋅T)\mathcal{O}((D+H)\cdot H\cdot T)$ .

解：
ii) ① $F_1$ 分数的意义理解起来不够明显、直接明了。

② $F_1$ 分数的计算需要整个语料库来计算，很难进行批训练和并行运算。

"""
__call__函数的含义：假设实例化了一个该类的对象instan,那么instan(inputs,state)其实就会调用__call__()函数，这样在__call__()函数中实现前向传播，调用就很方便
"""
def __call__(self, inputs, state, scope=None):"""Updates the state using the previous @state and @inputs.Remember the RNN equations are:h_t = sigmoid(x_t W_x + h_{t-1} W_h + b)TODO: In the code below, implement an RNN cell using @inputs(x_t above) and the state (h_{t-1} above).- Define W_x, W_h, b to be variables of the apporiate shapeusing the `tf.get_variable' functions. Make sure you usethe names "W_x", "W_h" and "b"!- Compute @new_state (h_t) defined aboveTips:- Remember to initialize your matrices using the xavierinitialization as before.Args:inputs: is the input vector of size [None, self.input_size]state: is the previous state vector of size [None, self.state_size]scope: is the name of the scope to be used when defining the variables inside.Returns:a pair of the output vector and the new state vector."""scope = scope or type(self).__name__# It's always a good idea to scope variables in functions lest they# be defined elsewhere!with tf.variable_scope(scope):### YOUR CODE HERE (~6-10 lines)W_x = tf.get_variable('W_x',[self.input_size,self._state_size],dtype=tf.float32,initializer=tf.contrib.layers.xavier_initializer())W_h = tf.get_variable('W_h',[self._state_size,self._state_size],dtype=tf.float32,initializer=tf.contrib.layers.xavier_initializer())b = tf.get_variable('b',[self._state_size],dtype=tf.float32,initializer=tf.contrib.layers.xavier_initializer())new_state = tf.nn.sigmoid(tf.matmul(inputs,W_x)+tf.matmul(state,W_h)+b)### END YOUR CODE #### For an RNN , the output and state are the same (N.B. this# isn't true for an LSTM, though we aren't using one of those in# our assignment)output = new_statereturn output, new_state

解：
i) 如果不使用mask vector，对于t>T的部分，本不属于句子但算入最终的损失，但是算是增大，而这部分对应的x和y都是0，这样学习出来的模型更容易偏好y=x先这样的预测。
后面所补的零向量所产生的损失对前面的隐藏状态的梯度更新有影响。

ii)

def pad_sequences(data, max_length):ret = []# Use this zero vector when padding sequences.zero_vector = [0] * Config.n_featureszero_label = 4 # corresponds to the 'O' tagfor sentence, labels in data:### YOUR CODE HERE (~4-6 lines)mask = [True]*len(sentence)if len(sentence)>=max_length:sentence_pad = sentence[:max_length]labels_pad = labels[:max_length]mask_pad = mask[:max_length]else :pad_n = max_length-len(sentence)sentence_pad = sentence + [zero_vector]*pad_nlabels_pad = labels + [zero_label]*pad_nmask_pad = mask + [False]*pad_nret.append((sentence_pad,labels_pad,mask_pad))### END YOUR CODE ###return ret

def add_placeholders(self):### YOUR CODE HERE (~4-6 lines)self.input_placeholder = tf.placeholder(tf.int32,[None,self.max_length,self.config.n_features],name='input')self.labels_placeholder =tf.placeholder(tf.int32,[None,self.max_length],name='label')self.mask_placeholder =  tf.placeholder(tf.bool,[None,self.max_length],name='mask')self.dropout_placeholder=tf.placeholder(tf.float32,name='dropout')### END YOUR CODE

def add_embedding(self):### YOUR CODE HERE (~4-6 lines)#注意要使用预训练的词向量embed = tf.Variable(self.pretrained_embeddings)embeddings = tf.nn.embedding_lookup(embed,self.input_placeholder)embeddings = tf.reshape(embeddings,[-1,self.max_length,self.config.n_features*self.config.embed_size])### END YOUR CODEreturn embeddings

def add_training_op(self, loss):### YOUR CODE HERE (~1-2 lines)train_op = tf.train.AdamOptimizer(self.config.lr).minimize(loss)### END YOUR CODEreturn train_op

def add_prediction_op(self):x = self.add_embedding()dropout_rate = self.dropout_placeholderpreds = [] # Predicted output at each timestep should go here!# Use the cell defined below. For Q2, we will just be using the# RNNCell you defined, but for Q3, we will run this code again# with a GRU cell!if self.config.cell == "rnn":cell = RNNCell(Config.n_features * Config.embed_size, Config.hidden_size)elif self.config.cell == "gru":cell = GRUCell(Config.n_features * Config.embed_size, Config.hidden_size)else:raise ValueError("Unsuppported cell type: " + self.config.cell)# Define U and b2 as variables.# Initialize state as vector of zeros.### YOUR CODE HERE (~4-6 lines)with tf.variable_scope('output'):U = tf.get_variable('U',[self.config.hidden_size,self.config.n_classes],initializer=tf.contrib.layers.xavier_initializer())b2= tf.get_variable('b2',[self.config.n_classes],initializer=tf.constant_initializer(0))"""初始化h0,h0的shape的最后一维很明显是hidden_size,而第一维应该是batch_size,但这里并不写死，然后而是根据x的shape的第一维来确定batch_size的大小"""x_shape = tf.shape(x)new_state = tf.zeros((x_shape[0],self.config.hidden_size))### END YOUR CODEwith tf.variable_scope("RNN"):"""1.首先，我们要进行RNN模型的训练就需要定义RNN模型的cell，也就是q2_rnn_cell.py中RNNCell类的实例(这在269-272行已经定义过了)2.先回顾一下，我们在q2_rnn_cell的__call__(input,state,scope)中定义了W_h,W_x和b并且variable_scope(scope)，所以，在第一次调用cell的时候，程序会创建scope的变量命名空间，之后再次调用的时候应该tf.get_variable_scope().reuse_variables()来重用之前定义的变量，也就是不能重复定义新的W_h,W_x和b。3.定义常量h_0作为起始隐藏状态，注意是常量，不能训练的那种。4.其他的按223-223行计算即可，把输出append进preds中"""for time_step in range(self.max_length):### YOUR CODE HERE (~6-10 lines)if time_step>0:tf.get_variable_scope().reuse_variables()#o_t, h_t = cell(x_t, h_{t-1})#这里的x[:,time_step,:]，第一个:代表取一个batch的全部数据，time_step指定第几个word，#最后一个:代表取这个批次的全部特征。即：取整个batch的第time_step个word的特征output_state,new_state = cell(x[:,time_step,:],new_state,'rnn-hidden')#o_drop_t = Dropout(o_t, dropout_rate)output_dropout = tf.nn.dropout(output_state,keep_prob=dropout_rate)#y_t = o_drop_t U + b_2y_t = tf.matmul(output_dropout,U)+b2preds.append(y_t)### END YOUR CODE# Make sure to reshape @preds here.### YOUR CODE HERE (~2-4 lines)"""先来推算一下preds的形状：preds是个list，长度为self.max_length，每一个元素一个batch的输出，故每一个元素的形状为[batch_size,n_classes]，故preds的形状为[max_length,batch_size,n_classes]"""#改成了tf.stack，不用tf.pack了#https://blog.csdn.net/qq_33655521/article/details/83750546preds = tf.stack(preds,axis=1)### END YOUR CODEassert preds.get_shape().as_list() == [None, self.max_length, self.config.n_classes], "predictions are not of the right shape. Expected {}, got {}".format([None, self.max_length, self.config.n_classes], preds.get_shape().as_list())return preds

def add_loss_op(self, preds):### YOUR CODE HERE (~2-4 lines)"""我们可以根据mask取出真正的preds和labels，然后再向往常那样计算交叉熵"""mask_preds = tf.boolean_mask(preds,self.mask_placeholder)mask_label = tf.boolean_mask(self.labels_placeholder,self.mask_placeholder)loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels=mask_label,logits=mask_preds))# preds_,labels= [],[]# pred_shape = tf.size(preds)# print(pred_shape.eval())# print(pred_shape[0].eval())# for i in range(tf.to_int32(pred_shape[0])):#     batch_data = preds[i]#     #查看一个batch数据的第i个样本，这句话中每一个单词(下标为j)#     preds_.append([batch_data[j] for j in range(self.max_length) if self.mask_placeholder[i][j]==True])#     labels.append([self.labels_placeholder[i][j] for j in range(self.max_length) if self.mask_placeholder[i][j]==True])# loss = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels,logits=preds_)# loss = tf.reduce_mean(loss)### END YOUR CODEreturn loss

结果：

DEBUG:Token-level confusion matrix:
go\gu       PER         ORG         LOC         MISC        O
PER         2968.00     26.00       75.00       10.00       70.00
ORG         111.00      1663.00     99.00       81.00       138.00
LOC         27.00       66.00       1938.00     22.00       41.00
MISC        32.00       33.00       47.00       1054.00     102.00
O           34.00       36.00       22.00       30.00       42637.00DEBUG:Token-level scores:
label   acc     prec    rec     f1
PER     0.99    0.94    0.94    0.94
ORG     0.99    0.91    0.79    0.85
LOC     0.99    0.89    0.93    0.91
MISC    0.99    0.88    0.83    0.86
O       0.99    0.99    1.00    0.99
micro   0.99    0.98    0.98    0.98
macro   0.99    0.92    0.90    0.91
not-O   0.99    0.91    0.89    0.90INFO:Entity level P/R/F1: 0.85/0.86/0.86

解：
i) ① 句子太长，容易梯度消失；
② 无法利用后文信息来决策。

ii) ① 加入GRU门控单元；
② 利用双向的RNN，即biRNN。

def __call__(self, inputs, state, scope=None):"""Updates the state using the previous @state and @inputs.Remember the GRU equations are:z_t = sigmoid(x_t U_z + h_{t-1} W_z + b_z)r_t = sigmoid(x_t U_r + h_{t-1} W_r + b_r)o_t = tanh(x_t U_o + r_t * h_{t-1} W_o + b_o)h_t = z_t * h_{t-1} + (1 - z_t) * o_tTODO: In the code below, implement an GRU cell using @inputs(x_t above) and the state (h_{t-1} above).- Define W_r, U_r, b_r, W_z, U_z, b_z and W_o, U_o, b_o tobe variables of the apporiate shape using the`tf.get_variable' functions.- Compute z, r, o and @new_state (h_t) defined aboveTips:- Remember to initialize your matrices using the xavierinitialization as before.Args:inputs: is the input vector of size [None, self.input_size]state: is the previous state vector of size [None, self.state_size]scope: is the name of the scope to be used when defining the variables inside.Returns:a pair of the output vector and the new state vector."""scope = scope or type(self).__name__# It's always a good idea to scope variables in functions lest they# be defined elsewhere!"""z_t = sigmoid(x_t U_z + h_{t-1} W_z + b_z)r_t = sigmoid(x_t U_r + h_{t-1} W_r + b_r)o_t = tanh(x_t U_o + r_t * h_{t-1} W_o + b_o)h_t = z_t * h_{t-1} + (1 - z_t) * o_t"""with tf.variable_scope(scope):### YOUR CODE HERE (~20-30 lines)W_z = tf.get_variable('W_z',[self._state_size,self._state_size],dtype=tf.float32,initializer=tf.contrib.layers.xavier_initializer())U_z = tf.get_variable('U_z',[self.input_size,self._state_size],dtype=tf.float32,initializer=tf.contrib.layers.xavier_initializer())b_z = tf.get_variable('b_z',[self._state_size],dtype=tf.float32,initializer=tf.constant_initializer(0))W_r = tf.get_variable('W_r',[self._state_size,self._state_size],dtype=tf.float32,initializer=tf.contrib.layers.xavier_initializer())U_r = tf.get_variable('U_r',[self.input_size,self._state_size],dtype=tf.float32,initializer=tf.contrib.layers.xavier_initializer())b_r = tf.get_variable('b_r',[self._state_size],dtype=tf.float32,initializer=tf.constant_initializer(0))W_o = tf.get_variable('W_o',[self._state_size,self._state_size],dtype=tf.float32,initializer=tf.contrib.layers.xavier_initializer())U_o = tf.get_variable('U_o',[self.input_size,self._state_size],dtype=tf.float32,initializer=tf.contrib.layers.xavier_initializer())b_o = tf.get_variable('b_o',[self._state_size],dtype=tf.float32,initializer=tf.constant_initializer(0))#更新门z_t = tf.nn.sigmoid(tf.matmul(inputs,U_z)+tf.matmul(state,W_z)+b_z)#重置门r_t = tf.nn.sigmoid(tf.matmul(inputs,U_r)+tf.matmul(state,W_r)+b_r)#候选状态h_  = tf.nn.tanh(tf.matmul(inputs,U_o)+tf.matmul(r_t*state,W_o)+b_o)new_state = tf.multiply(z_t,state)+tf.multiply(1-z_t,h_)### END YOUR CODE #### For a GRU, the output and state are the same (N.B. this isn't true# for an LSTM, though we aren't using one of those in our# assignment)output = new_statereturn output, new_state

def add_prediction_op(self): """Runs an rnn on the input using TensorFlows's@tf.nn.dynamic_rnn function, and returns the final state as a prediction.TODO: - Call tf.nn.dynamic_rnn using @cell below. See:https://www.tensorflow.org/api_docs/python/nn/recurrent_neural_networks- Apply a sigmoid transformation on the final state tonormalize the inputs between 0 and 1.Returns:preds: tf.Tensor of shape (batch_size, 1)"""# Pick out the cell to use here.if self.config.cell == "rnn":cell = RNNCell(1, 1)elif self.config.cell == "gru":cell = GRUCell(1, 1)elif self.config.cell == "lstm":cell = tf.nn.rnn_cell.LSTMCell(1)else:raise ValueError("Unsupported cell type.")x = self.inputs_placeholder### YOUR CODE HERE (~2-3 lines)preds = tf.nn.dynamic_rnn(cell,x,dtype=tf.float32)[1]preds = tf.nn.sigmoid(preds)### END YOUR CODEreturn preds #state # preds

def add_training_op(self, loss):optimizer = tf.train.GradientDescentOptimizer(learning_rate=self.config.lr)### YOUR CODE HERE (~6-10 lines)# - Remember to clip gradients only if self.config.clip_gradients# is True.# - Remember to set self.grad_normgrad_and_var = optimizer.compute_gradients(loss)gradients = [item[0] for item in grad_and_var]variables = [item[1] for item in grad_and_var]if self.config.clip_gradients:clipped_grad = tf.clip_by_global_norm(gradients,clip_norm=self.config.max_grad_norm)[0]gradients = clipped_gradgrad_and_var = list(zip(gradients,variables))self.grad_norm = gradientstrain_op = optimizer.apply_gradients(grad_and_var)### END YOUR CODEassert self.grad_norm is not None, "grad_norm was not set properly!"return train_op

RNN

GRU

解：

i) rnn和GRU都会梯度消失，但是rnn消失的更快一些，因此梯度裁剪也不会有帮助。

ii) GRU可以有效防止梯度消失.

结果

DEBUG:Token-level confusion matrix:
go\gu           PER             ORG             LOC             MISC            O
PER             2998.00         20.00           17.00           24.00           90.00
ORG             140.00          1639.00         75.00           108.00          130.00
LOC             59.00           82.00           1868.00         39.00           46.00
MISC            42.00           21.00           31.00           1045.00         129.00
O               26.00           42.00           9.00            37.00           42645.00DEBUG:Token-level scores:
label   acc     prec    rec     f1
PER     0.99    0.92    0.95    0.93
ORG     0.99    0.91    0.78    0.84
LOC     0.99    0.93    0.89    0.91
MISC    0.99    0.83    0.82    0.83
O       0.99    0.99    1.00    0.99
micro   0.99    0.98    0.98    0.98
macro   0.99    0.92    0.89    0.90
not-O   0.99    0.91    0.88    0.89INFO:Entity level P/R/F1: 0.85/0.86/0.86

CS224n课程Assignment3参考答案相关推荐

CS224n课程Assignment2参考答案
Assignment#2−solutionByJonariguezAssignment\#2 -solution\quad By\ JonariguezAssignment#2−solutionByJ ...
2020年春季学期信号与系统课程作业参考答案-第十五次作业
信号与系统课程第十五次作业参考答案 ※ 第一题已知x[n],h[n]x\left[ n \right],h\left[ n \right]x[n],h[n]长度分别是10, 25.设:y1[n]=x ...
2020年春季学期信号与系统课程作业参考答案-第十四次作业
信号与系统课程第十四次作业参考答案 ※ 第一题用闭式表达式写出下面有限长序列的离散傅里叶变换(DFT): (1) x[n]=δ[n]x\left[ n \right] = \delta \left[ ...
2020年春季学期信号与系统课程作业参考答案-第十三次作业
信号与系统课程第十三次作业参考答案 ※ 第一题如下图所示的反馈系统,回答以下各列问题: (1)写出系统的传递函数:H(s)=V2(s)V1(s)H\left( s \right) = {{V_2 \ ...
2020年春季学期信号与系统课程作业参考答案-第十二次作业
信号与系统第十二次作业参考答案 ※ 第一题利用Laplace变换求解下列微分方程: (1)d2dt2y(t)+2ddty(t)+y(t)=δ(t)+2δ′(t){{d^2 } \over {dt^2 ...
2020年春季学习信号与系统课程作业参考答案-第十一次作业
信号与系统第十一次作业参考答案 ※ 第一题利用三种逆变方法求下列X(z)X\left( z \right)X(z)的逆变换x[n]x\left[ n \right]x[n]. X(z)=10z(z− ...
2020年春季学期信号与系统课程作业参考答案-第十次作业
第十次作业参考答案 01第一题第一小题中的求解除了(14)(15)小题之外,其他的各题都可以在MATLAB中使用MATLAB的符号计算帮助求解,一边检查求解的结果正确性. 使用MATLAB求解第一小 ...
2020年春季学期信号与系统课程作业参考答案-第九次作业
第九次作业参考答案 01第一题已知x(t)x\left( t \right)x(t)和X(ω)X\left( \omega \right)X(ω)是一对傅里叶变换,xs(t)x_s \left( t ...
计算机网络技术主要包括计算机技术和什么,《计算机网络技术》第6章作业的参考答案...
<计算机网络技术>课程作业参考答案第六章应用层 6.2域名系统的主要功能是什么?域名系统中的本地域名服务器.根域名服务器.顶级域名服务器及权限域名服务器有何区别? 解析:域名系统中的服 ...

CS224n课程Assignment3参考答案

CS224n课程Assignment3参考答案相关推荐

最新文章

热门文章