pytorch之LayerNorm

LayerNorm 相比 BatchNorm 有以下两个优点：

LN 针对单个训练样本进行，不依赖于其他数据，因此可以避免 BN 中受 mini-batch 数据分布影响的问题，可以用于小mini-batch场景、动态网络场景和 RNN，特别是自然语言处理领域。
LN 不需要保存 mini-batch 的均值和方差，节省了额外的存储空间。

更具体介绍参考模型优化之Layer Normalization

y=x−E[x]Var⁡[x]+ϵ∗γ+βy=\frac{x-\mathrm{E}[x]}{\sqrt{\operatorname{Var}[x]+\epsilon}} * \gamma+\beta y=Var[x]+ϵx−E[x]∗γ+β

公式看上去和BN一致，但是这里统计的样本和方差是在同一个样本的不同属性上

torch.nn.LayerNorm

normalized_shape – 指定需要进行LayerNorm的维度，可以是int，list和 torch.Size
eps – 上述公式中的ϵ，防止分母为0，默认为1e-5
elementwise_affine – 是否使用可训练参数的β和γ ，默认为True

import torch
import torch.nn as nn
import numpy as np
import mathdef validation(x):"""验证函数:param x::return:"""x = np.array(x)avg = np.mean(x, axis=1)#维度 3*1std2 = np.var(x, axis=1)#维度3*1x_avg = [[item for item in avg] for _ in range(x.shape[1])]x_std = [[math.pow(item, 1 / 2) for item in std2] for _ in range(x.shape[1])]x_ = (x - np.array(x_avg).T) / np.array(x_std).Treturn x_x = [[1, 2, 3, 4], [2, 3, 4, 5], [3, 4, 5, 6]]
# 维度：3*4
input = torch.tensor(x, dtype=torch.float)
m = nn.LayerNorm(4)
output = m(input)
print(output)val = validation(x)
print(val)

结果：

tensor([[-1.3416, -0.4472,  0.4472,  1.3416],[-1.3416, -0.4472,  0.4472,  1.3416],[-1.3416, -0.4472,  0.4472,  1.3416]],grad_fn=<NativeLayerNormBackward>)[[-1.34164079 -0.4472136   0.4472136   1.34164079][-1.34164079 -0.4472136   0.4472136   1.34164079][-1.34164079 -0.4472136   0.4472136   1.34164079]]

normalized_shape 输入为torch.size处理更高的维度

x = [[[1, 1], [2, 2], [3, 3], [4, 4]], [[2, 2], [3, 3], [4, 4], [5, 5]], [[3, 3], [4, 4], [5, 5], [6, 6]]]
input = torch.tensor(x, dtype=torch.float)
normalized_shape = input.size()[1:]
print(normalized_shape)
m = nn.LayerNorm(normalized_shape)
output = m(input)
print(output)

输出：

torch.Size([4, 2])
tensor([[[-1.3416, -1.3416],[-0.4472, -0.4472],[ 0.4472,  0.4472],[ 1.3416,  1.3416]],[[-1.3416, -1.3416],[-0.4472, -0.4472],[ 0.4472,  0.4472],[ 1.3416,  1.3416]],[[-1.3416, -1.3416],[-0.4472, -0.4472],[ 0.4472,  0.4472],[ 1.3416,  1.3416]]], grad_fn=<NativeLayerNormBackward>)

相当于两个3*4的矩阵单独处理。

pytorch之LayerNorm相关推荐

深度学习笔记016：BatchNorm批量归一化+nn.LayerNorm暂记
实现 pytorch 实现 Layernorm的官方文档 import torch import torch.nn as nn# NLP Example batch, sentence_length, ...
BatchNorm和LayerNorm
一.BatchNorm 论文:Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariat ...
CV【5】：Layer normalization
系列文章目录 Normalization 系列方法(一):CV[4]:Batch normalization Normalization 系列方法(二):CV[5]:Layer normalizati ...
Pytorch归一化方法讲解与实战：BatchNormalization、LayerNormalization、nn.BatchNorm1d和LayerNorm()和F.normalize()
文章目录 LayerNormalization BatchNormalization F.normalize 这些Normalization的作用都是让数据保持一个比较稳定的分布,从而加速收敛.Bat ...
pytorch中的BatchNorm和LayerNorm
参考文章 https://blog.csdn.net/weixin_39228381/article/details/107896863 https://blog.csdn.net/weixin_39 ...
pytorch 层标准化 LayerNorm 的用法
目录 1.为什么要标准化(理解的直接跳过到这部分) 2.LayerNorm 解释 3.举例-只对最后 1 个维度进行标准化 4.举例-对最后 D 个维度进行标准化 1.为什么要标准化(理解的直接跳过到 ...
PyTorch学习之归一化层（BatchNorm、LayerNorm、InstanceNorm、GroupNorm）
BN,LN,IN,GN从学术化上解释差异: BatchNorm:batch方向做归一化,算NHW的均值,对小batchsize效果不好:BN主要缺点是对batchsize的大小比较敏感,由于每次计算均 ...
Transformer的PyTorch实现
Google 2017年的论文 Attention is all you need 阐释了什么叫做大道至简!该论文提出了Transformer模型,完全基于Attention mechanism,抛弃 ...
[Pytorch]基于混和精度的模型加速
这篇博客是在pytorch中基于apex使用混合精度加速的一个偏工程的描述,原理层面的解释并不是这篇博客的目的,不过在参考部分提供了非常有价值的资料,可以进一步研究. 一个关键原则:"仅仅在 ...

pytorch之LayerNorm

torch.nn.LayerNorm

pytorch之LayerNorm相关推荐

最新文章

热门文章