公式

下面公式忽略bias，由于输入向量的长度和隐藏层特征值长度不一致，所以每个公式的W都按x和h分开。这跟理论公式部分有一些具体的实践上区别。

reset gate，重置门
rt=σ(Wirxt+Whrht−1)r_t = \sigma(W_{ir}x_t+W_{hr}h_{t-1})rt=σ(Wirxt+Whrht−1) GRU里的参数是WirW_{ir}Wir 和WirW_{ir}Wir
update gate，更新门
zt=σ(Wizxt+Whzht−1)z_t = \sigma(W_{iz}x_t+W_{hz}h_{t-1})zt=σ(Wizxt+Whzht−1) GRU里的参数是WizW_{iz}Wiz 和WhzW_{hz}Whz
更新状态阈值
nt=tanh(Winxt+rt(Whnht−1))n_t = tanh (W_{in}x_t+r_t(W_{hn} h_{t-1}))nt=tanh(Winxt+rt(Whnht−1)) GRU里的参数是WinW_{in}Win 和WhnW_{hn}Whn
这里同LSTM里的g(t)g(t)g(t)函数，只是多了重置门对ht−1h_{t-1}ht−1的影响
更新hth_tht
ht=(1−zt)nt+ztht−1h_t = (1-z_t)n_t + z_t h_{t-1}ht=(1−zt)nt+ztht−1

所以从输入张量和隐藏层张量来说，一共有两组参数(忽略bias参数)

input 组 {WirW_{ir}Wir WizW_{iz}Wiz WinW_{in}Win}
hidden组 {WirW_{ir}Wir WhzW_{hz}Whz WhnW_{hn}Whn }

因为hidden size为隐藏层特征输出长度，所以每个参数第一维度都是hidden size；然后每一组是把3个张量按照第一维度拼接，所以要乘以3

举例代码

from torch import nngru = nn.GRU(input_size=3, hidden_size=5, num_layers=1, bias=False)print('weight_ih_l0.shape = ', gru.weight_ih_l0.shape, ', weight_hh_l0.shape = ' , gru.weight_hh_l0.shape)

双向GRU

如果要实现双向的GRU，只需要增加参数bidirectional=True

但是参数并没有增加。

from torch import nngru = nn.GRU(input_size=3, hidden_size=5, num_layers=1, bidirectional=True, bias=False)print('weight_ih_l0.shape = ', gru.weight_ih_l0.shape, ', weight_ih_l0_reverse.shape = ', gru.weight_ih_l0_reverse.shape,'\nweight_hh_l0.shape = ' , gru.weight_hh_l0.shape, ', weight_hh_l0_reverse.shape = ', gru.weight_hh_l0_reverse.shape)

多层的概念

可以参考这里 https://blog.csdn.net/mimiduck/article/details/119975080

【pytorch】nn.GRU的使用相关推荐

PyTorch nn.GRU 使用详解
我们看官方文档一些参数介绍,以及如下一个简单例子: 看完之后,还是一脸懵逼: 输入什么鬼? 输出又什么鬼? (这里我先把官网中 h0 去掉了,便于大家先理解更重要的概念) import torch f ...
pytorch笔记：torch.nn.GRU torch.nn.LSTM
1 函数介绍 (GRU) 对于输入序列中的每个元素,每一层计算以下函数: 其中是在t时刻的隐藏状态,是在t时刻的输入.σ是sigmoid函数,*是逐元素的哈达玛积对于多层GRU 第l层的输入(l≥2 ...
pytorch nn.Embedding
pytorch nn.Embedding class torch.nn.Embedding(num_embeddings, embedding_dim, padding_idx=None, max_n ...
Pytorch.nn.Linear 解析（数学角度）
pytorch.nn.Linear 是一个类,下面是它的一些初始化参数 in_features : 输入样本的张量大小 out_features : 输出样本的张量大小 bias : 偏置它主要是对 ...
Pytorch GRU(详解GRU+torch.nn.GRU()实现)
pytorch GRU 目录 pytorch GRU 一.GRU简介1 二.GRU简介2 三.pytorch GRU 3.1 定义GRU ()
pytorch nn.LSTM()参数详解
输入数据格式: input(seq_len, batch, input_size) h0(num_layers * num_directions, batch, hidden_size) c0(num ...
pytorch系列 -- 9 pytorch nn.init 中实现的初始化函数 uniform, normal, const, Xavier, He initialization...
本文内容: 1. Xavier 初始化 2. nn.init 中各种初始化函数 3. He 初始化 torch.init https://pytorch.org/docs/stable/nn.html ...
Pytorch nn.Transformer的mask理解
点击上方"视学算法",选择加"星标"或"置顶" 重磅干货,第一时间送达作者丨林小平@知乎(已授权) 来源丨https://zhuanlan ...
PyTorch nn.Module 一些疑问
在阅读书籍时,遇到了一些不太理解,或者介绍的不太详细的点. 从代码角度学习理解Pytorch学习框架03: 神经网络模块nn.Module的了解. Pytorch 03: nn.Module模块了解 ...

【pytorch】nn.GRU的使用

公式

举例代码

双向GRU

多层的概念

【pytorch】nn.GRU的使用相关推荐

最新文章

热门文章