BatchNorm、LayerNorm、InstanceNorm及GroupNorm
Refs:
深度学习中的Normalization方法
将输入维度记为[N,C,H,W][N, C, H, W][N,C,H,W],在计算操作上,不同Normalization的主要区别在于:
- Batch Normalization:在Batch Size方向上,对NHW做归一化,对batch size大小比较敏感;
- Layer Normalization:在Channel方向上,对CHW归一化;
- Instance Normalization:在图像像素上,对HW做归一化,多用于风格化迁移;
- Group Normalization:将Channel分组–>[B, g, C//g, H, W],然后再对后三个维度做归一化(和InstanceNorm和LayerNorm都相似之处);
以下针对Pytorch中不同Normalizaiotn计算示例,均忽略可学习的仿射变换参数γ\gammaγ和β\betaβ。
(如果有写的不对的地方,请大家指出)
1. BatchNorm2d
API: CLASS torch.nn.BatchNorm2d(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True, device=None, dtype=None)
y=x−E[x]Var[x]+ϵ∗γ+βy=\frac{x-\text{E}[x]}{\sqrt{\text{Var}[x] + \epsilon}} * \gamma + \betay=Var[x]+ϵx−E[x]∗γ+β
- 输入: [N, C, H, W]
- 输出: [N, C, H, W]
在NHW上计算均值和方差,代码示例如下:
# input data
N, C, H, W = 3, 5, 2, 2
x = torch.rand(N, C, H, W) # [N, C, H, W]# nn.BatchNorm2d 计算
bn_layer = torch.nn.BatchNorm2d(C, eps=0., affine=False, track_running_stats=False)
x_out_1 = bn_layer(x) # [N, C, H, W]# 按照定义计算
mean_x = x.mean((0, 2, 3)) # 在 NHW上计算均值和方差
std_x = x.std((0, 2, 3), unbiased=False)
x_out_2 = (x - mean_x[None, :, None, None]) / std_x[None, :, None, None]# x_out_1 应与 x_out_2 相等
"""
>>> x_out_1.view(3, 5, -1)
tensor([[[ 0.5701, 1.3119, 0.6911, -1.5281],[-0.2640, 0.6958, -0.4879, 2.6233],[ 1.5883, 1.3217, -0.9401, -0.8484],[ 0.6178, 0.7098, 0.6252, -0.1542],[-1.0076, -0.6226, 0.6902, -0.9112]],[[ 1.0838, 0.4721, 1.2620, -0.9831],[-0.0582, 0.7492, -0.1682, -0.8531],[ 0.2192, -0.9547, -0.8769, -1.0408],[-1.6932, 0.2731, -1.1455, -0.9619],[-0.3389, -0.1145, -0.2434, -1.3969]],[[-1.3717, -1.0275, 0.0167, -0.4972],[-0.1614, -0.5248, 0.0912, -1.6418],[ 1.6850, -0.3543, -0.3061, 0.5070],[-0.7309, -0.5870, 1.5495, 1.4972],[-0.5570, 1.7374, 1.4123, 1.3522]]])
>>> x_out_2.view(3, 5, -1)
tensor([[[ 0.5701, 1.3119, 0.6911, -1.5281],[-0.2640, 0.6958, -0.4879, 2.6233],[ 1.5883, 1.3217, -0.9401, -0.8484],[ 0.6178, 0.7098, 0.6252, -0.1542],[-1.0076, -0.6226, 0.6902, -0.9112]],[[ 1.0838, 0.4721, 1.2620, -0.9831],[-0.0582, 0.7492, -0.1682, -0.8531],[ 0.2192, -0.9547, -0.8769, -1.0408],[-1.6932, 0.2731, -1.1455, -0.9619],[-0.3389, -0.1145, -0.2434, -1.3969]],[[-1.3717, -1.0275, 0.0167, -0.4972],[-0.1614, -0.5248, 0.0912, -1.6418],[ 1.6850, -0.3543, -0.3061, 0.5070],[-0.7309, -0.5870, 1.5495, 1.4972],[-0.5570, 1.7374, 1.4123, 1.3522]]])
"""
2. LayerNorm
API: CLASS torch.nn.LayerNorm(normalized_shape, eps=1e-05, elementwise_affine=True, device=None, dtype=None)
y=x−E[x]Var[x]+ϵ∗γ+βy=\frac{x-\text{E}[x]}{\sqrt{\text{Var}[x] + \epsilon}} * \gamma + \betay=Var[x]+ϵx−E[x]∗γ+β
在CHW上计算均值与方差,示例代码如下:
- 如果输入是 [N, C, H, W]形式,即API示例中的Image Example
N, C, H, W = 3, 5, 2, 2
x = torch.rand(N, C, H, W)# nn.LayerNorm计算
ln_layer = torch.nn.LayerNorm([C, H, W], eps=0., elementwise_affine=False)
x_out_1 = ln_layer(x) # [N, C, H, W]# 按照定义计算
x_mean = x.mean([1, 2, 3])
x_std = x.std([1, 2, 3], unbiased=False)
x_out_2 = (x - x_mean[:, None, None, None]) / x_std[:, None, None, None]# x_out_1 应与 x_out_2 相等
"""
>>> x_out_1.view(3, 5, -1)
tensor([[[ 0.6501, 1.5364, 0.7946, -1.8568],[-0.6528, 0.1166, -0.8324, 1.6618],[ 1.2697, 0.9812, -1.4672, -1.3680],[ 0.4042, 0.4850, 0.4107, -0.2747],[-0.8916, -0.5898, 0.4390, -0.8160]],[[ 2.0325, 1.1983, 2.2755, -0.7861],[ 0.0332, 0.7719, -0.0675, -0.6942],[ 0.3477, -1.1027, -1.0066, -1.2091],[-1.2681, 0.7054, -0.7184, -0.5341],[ 0.1706, 0.3713, 0.2560, -0.7758]],[[-1.5146, -1.0963, 0.1726, -0.4520],[-0.3965, -0.6928, -0.1905, -1.6036],[ 1.5817, -0.6635, -0.6105, 0.2847],[-0.6113, -0.4826, 1.4282, 1.3815],[-0.3637, 1.4651, 1.2060, 1.1581]]])
>>> x_out_2.view(3, 5, -1)
tensor([[[ 0.6501, 1.5364, 0.7946, -1.8568],[-0.6528, 0.1166, -0.8324, 1.6618],[ 1.2697, 0.9812, -1.4672, -1.3680],[ 0.4042, 0.4850, 0.4107, -0.2747],[-0.8916, -0.5898, 0.4390, -0.8160]],[[ 2.0325, 1.1983, 2.2755, -0.7861],[ 0.0332, 0.7719, -0.0675, -0.6942],[ 0.3477, -1.1027, -1.0066, -1.2091],[-1.2681, 0.7054, -0.7184, -0.5341],[ 0.1706, 0.3713, 0.2560, -0.7758]],[[-1.5146, -1.0963, 0.1726, -0.4520],[-0.3965, -0.6928, -0.1905, -1.6036],[ 1.5817, -0.6635, -0.6105, 0.2847],[-0.6113, -0.4826, 1.4282, 1.3815],[-0.3637, 1.4651, 1.2060, 1.1581]]])
"""
- 如果输入是 [N, L, C]形式,即API示例中的NLP Example(图像描述中通常是这种数据组织形式),则均值和方差均在C上进行求取,即在输入数据的最后一维上求均值和方差
N, L, C = 3, 4, 5
x = torch.rand(N, L, C)# nn.LayerNorm计算
ln_layer = torch.nn.LayerNorm(C, eps=0., elementwise_affine=False)
x_out_1 = ln_layer(x) # [N, L, C]# 按照定义计算
x_mean = x.mean(-1)
x_std = x.std(-1, unbiased=False)
x_out_2 = (x - x_mean[:, :, None]) / x_std[:, :, None]
"""
>>> x_out_1
tensor([[[-0.2380, -0.2267, 1.9469, -0.7811, -0.7011],[ 1.8029, -1.1073, -0.5569, 0.2497, -0.3884],[-1.1464, -0.3209, 1.6030, 0.6406, -0.7764],[-0.0740, -1.6507, 0.7076, 1.2997, -0.2825]],[[ 0.7822, -0.4960, 0.9142, 0.5369, -1.7373],[-1.7976, -0.0445, 0.3672, 1.2590, 0.2159],[ 0.7396, 1.1869, -1.1813, -1.2006, 0.4555],[ 1.0684, 1.0592, -1.0150, 0.1810, -1.2937]],[[-0.4093, 1.6552, 0.4399, -0.3537, -1.3320],[-0.8034, 0.9525, 1.3389, -1.2672, -0.2208],[ 0.2419, -1.4972, -0.6267, 0.4232, 1.4588],[-0.7910, -1.3169, -0.1005, 1.4134, 0.7951]]])
>>> x_out_2
tensor([[[-0.2380, -0.2267, 1.9469, -0.7811, -0.7011],[ 1.8029, -1.1073, -0.5569, 0.2497, -0.3884],[-1.1464, -0.3209, 1.6030, 0.6406, -0.7764],[-0.0740, -1.6507, 0.7076, 1.2997, -0.2825]],[[ 0.7822, -0.4960, 0.9142, 0.5369, -1.7373],[-1.7976, -0.0445, 0.3672, 1.2590, 0.2159],[ 0.7396, 1.1869, -1.1813, -1.2006, 0.4555],[ 1.0684, 1.0592, -1.0150, 0.1810, -1.2937]],[[-0.4093, 1.6552, 0.4399, -0.3537, -1.3320],[-0.8034, 0.9525, 1.3389, -1.2672, -0.2208],[ 0.2419, -1.4972, -0.6267, 0.4232, 1.4588],[-0.7910, -1.3169, -0.1005, 1.4134, 0.7951]]])
"""
3. InstanceNorm2d
API: CLASS torch.nn.InstanceNorm2d(num_features, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False, device=None, dtype=None)
y=x−E[x]Var[x]+ϵ∗γ+βy=\frac{x-\text{E}[x]}{\sqrt{\text{Var}[x] + \epsilon}} * \gamma + \betay=Var[x]+ϵx−E[x]∗γ+β
- 输入: [N, C, H, W] or [C, H, W]
- 输出: [N, C, H, W] or [C, H, W] (与输入维度一致)
在HW上求均值和方差,代码示例如下:
N, C, H, W = 3, 5, 2, 2
x = torch.rand(N, C, H, W)# nn.InstanceNorm2d计算
in_layer = torch.nn.InstanceNorm2d(C, eps=0., affine=False, track_running_stats=False)
x_out_1 = in_layer(x)# 根据定义计算
x_mean = x.mean((-1, -2))
x_std = x.std((-1, -2), unbiased=False)
x_out_2 = (x - x_mean[:, :, None, None]) / x_std[:, :, None, None]
"""
>>> x_out_1.view(3, 5, -1)
tensor([[[ 1.5311, 0.2455, -0.9808, -0.7958],[-0.9634, -0.1927, -0.5097, 1.6658],[ 0.6410, 1.2393, -1.3178, -0.5626],[ 1.2098, -1.1274, 0.7531, -0.8355],[-0.5653, 1.6005, 0.0223, -1.0575]],[[ 0.5257, 1.3195, -1.2969, -0.5484],[-0.7867, -1.1250, 1.3355, 0.5762],[ 1.0166, 0.2899, -1.6605, 0.3540],[ 0.3264, 1.4894, -0.7927, -1.0231],[ 1.3285, 0.4290, -0.3755, -1.3820]],[[-1.5102, -0.2778, 1.0419, 0.7461],[ 0.3678, -1.1810, -0.6277, 1.4408],[ 1.7132, -0.5482, -0.3753, -0.7897],[ 0.7189, 0.9644, -0.0879, -1.5954],[-0.2354, -0.9607, 1.6719, -0.4759]]])
>>> x_out_2.view(3, 5, -1)
tensor([[[ 1.5311, 0.2455, -0.9808, -0.7958],[-0.9634, -0.1927, -0.5097, 1.6658],[ 0.6410, 1.2393, -1.3178, -0.5626],[ 1.2098, -1.1274, 0.7531, -0.8355],[-0.5653, 1.6005, 0.0223, -1.0575]],[[ 0.5257, 1.3195, -1.2969, -0.5484],[-0.7867, -1.1250, 1.3355, 0.5762],[ 1.0166, 0.2899, -1.6605, 0.3540],[ 0.3264, 1.4894, -0.7927, -1.0231],[ 1.3285, 0.4290, -0.3755, -1.3820]],[[-1.5102, -0.2778, 1.0419, 0.7461],[ 0.3678, -1.1810, -0.6277, 1.4408],[ 1.7132, -0.5482, -0.3753, -0.7897],[ 0.7189, 0.9644, -0.0879, -1.5954],[-0.2354, -0.9607, 1.6719, -0.4759]]])
"""
4. GroupNorm
API: CLASS torch.nn.GroupNorm(num_groups, num_channels, eps=1e-05, affine=True, device=None, dtype=None)
y=x−E[x]Var[x]+ϵ∗γ+βy=\frac{x-\text{E}[x]}{\sqrt{\text{Var}[x] + \epsilon}} * \gamma + \betay=Var[x]+ϵx−E[x]∗γ+β
- 输入: [N, C, *]
- 输出: [N, C, *] (与输入维度一致)
令输入维度为[N, C, H, W],首先对C进行分组, [N, C, H, W]–> [N, g, C//g, H, W],然后在C//g,H,W(即后三个维度方向)上求均值和方差,示例代码如下:
N, C, H, W = 3, 6, 2, 2
x = torch.rand(N, C, H, W) # [N, C, H, W]# nn.GroupNorm求解
# 把 C=6,划分为 2 组
gn_layer = torch.nn.GroupNorm(num_groups=2, num_channels=C, eps=0., affine=False)
x_out_1 = gn_layer(x)# 按照定义求解
x = x.view(N, 2, C // 2, H, W) # [N, C, H, W] --> [N, g, C//g, H, W]
x_mean = x.mean((2, 3, 4))
x_std = x.std((2, 3, 4), unbiased=False)
x_out_2 = (x - x_mean[:, :, None, None, None]) / x_std[:, :, None, None, None]
x_out_2 = x_out_2.view(N, C, H, W)
"""
>>> x_out_1.view(3, 6, -1)
tensor([[[-0.1290, -1.5416, 1.3508, 0.0259],[ 0.8220, -1.1861, -0.8968, 0.5246],[-1.1964, -0.2531, 1.0820, 1.3977],[ 0.6300, -1.5861, -1.6701, -0.3855],[ 0.6316, 1.1035, -0.2076, 0.7945],[ 0.9343, 0.2422, -1.4284, 0.9415]],[[-1.4208, -0.4870, 0.4255, -0.7972],[ 1.8013, 0.3366, 1.8382, -0.7250],[ 0.5121, -0.9930, 0.1396, -0.6302],[ 0.2940, 0.9422, 0.2082, -0.0493],[ 1.6209, -0.2877, -1.0879, 0.6238],[-0.5238, -1.7207, 1.3058, -1.3255]],[[ 0.1376, -1.6736, 1.5494, -0.6100],[-0.3534, 0.5688, -0.2642, 0.5488],[-0.8490, 1.9884, -0.0916, -0.9512],[-0.6563, 1.4381, 1.5124, 1.1264],[-0.9688, -0.5808, 0.1888, 0.0883],[-1.2760, -0.8207, 1.0518, -1.1034]]])
>>> x_out_2.view(3, 6, -1)
tensor([[[-0.1290, -1.5416, 1.3508, 0.0259],[ 0.8220, -1.1861, -0.8968, 0.5246],[-1.1964, -0.2531, 1.0820, 1.3977],[ 0.6300, -1.5861, -1.6701, -0.3855],[ 0.6316, 1.1035, -0.2076, 0.7945],[ 0.9343, 0.2422, -1.4284, 0.9415]],[[-1.4208, -0.4870, 0.4255, -0.7972],[ 1.8013, 0.3366, 1.8382, -0.7250],[ 0.5121, -0.9930, 0.1396, -0.6302],[ 0.2940, 0.9422, 0.2082, -0.0493],[ 1.6209, -0.2877, -1.0879, 0.6238],[-0.5238, -1.7207, 1.3058, -1.3255]],[[ 0.1376, -1.6736, 1.5494, -0.6100],[-0.3534, 0.5688, -0.2642, 0.5488],[-0.8490, 1.9884, -0.0916, -0.9512],[-0.6563, 1.4381, 1.5124, 1.1264],[-0.9688, -0.5808, 0.1888, 0.0883],[-1.2760, -0.8207, 1.0518, -1.1034]]])
"""
BatchNorm、LayerNorm、InstanceNorm及GroupNorm相关推荐
- BatchNorm、LayerNorm、InstanceNorm、GroupNorm、WeightNorm
今天看Transform时看到了LayerNorm,然后想到之前用过BatchNorm,就想着这两个有啥区别呢,然后找资料,就发现还有其他的归一化处理,就在这里整理一下,方便以后查阅. BatchNo ...
- PyTorch学习之归一化层(BatchNorm、LayerNorm、InstanceNorm、GroupNorm)
BN,LN,IN,GN从学术化上解释差异: BatchNorm:batch方向做归一化,算NHW的均值,对小batchsize效果不好:BN主要缺点是对batchsize的大小比较敏感,由于每次计算均 ...
- [深度学习]BatchNormalization、LayerNormalization、InstanceNorm、GroupNorm、SwitchableNorm个人总结
个人总结 1. 概述 2. Batch Normalization 3.Layer Normalizaiton 4. Instance Normalization 5. Group Normaliza ...
- BatchNormalization、LayerNormalization、InstanceNorm、GroupNorm、SwitchableNorm总结
本篇博客总结几种归一化办法,并给出相应计算公式和代码. 1.综述 1.1 论文链接 1.Batch Normalization https://arxiv.org/pdf/1502.03167.pdf ...
- 【基础知识】深度学习中各种归一化方式详解
本文转载自 https://blog.csdn.net/qq_23981335/article/details/106572171 仅作记录学习~ 总结 BN,LN,IN,GN,WS 从学术上解释差异 ...
- 机器学习笔记:神经网络层的各种normalization
1 Normalization的引入 1.1 独立同分布 机器学习,尤其是深度学习的模型,如果它的数据集时独立同分布的(i.i.d. independent and identically dist ...
- 深度学习中的Normalization模型(附实例公式)
来源:运筹OR帷幄 本文约14000字,建议阅读20分钟. 本文以非常宏大和透彻的视角分析了深度学习中的多种Normalization模型,从一个新的数学视角分析了BN算法为什么有效. [ 导读 ]不 ...
- 论文阅读笔记之手术器械分类的注意约束自适应核选择网络(SKA-ResNet)(一)
Adaptive kernel selection network with attention constraint for surgical instrument classification 一 ...
- 推荐收藏,25道机器学习面试问题(附答案)
近年来,对深度学习的需求不断增长,其应用程序被应用于各个商业部门.各公司现在都在寻找能够利用深度学习和机器学习技术的专业人士. 在本文中,将整理深度学习面试中最常被问到的25个问题和答案.如果你最近正 ...
最新文章
- javascript菜鸟学习20170113
- [蓝桥杯][基础练习VIP]完美的代价-贪心
- 多队列 部分队列没有包_记一次TCP全队列溢出问题排查过程
- 7添加静态路由 hat red_Centos7/RHEL 7 配置静态路由
- 飞龙的程序员书单 – 前端
- SSM中异常的几种处理方式
- 【RLchina第二讲】汪军老师推荐的强化学习理论学习资料
- csgo降低延迟指令_ILP——指令级并行
- Bartender软件使用VB判断条件并返回值
- 离线地图开发 支持 局域网内二次开
- 在EXCEL2010中添加打印水印
- “b数”(B树)是个怎么回事
- 关于raid5数据恢复
- 东北大学20级计算机C语言课设-航空订票系统
- qt 练习 题目 7 网络 查询 股票实时数据
- win7旗舰版计算机管理在哪,win7打开计算机管理,windows7计算机管理在哪里
- java将list转换为字符串_List转换成String字符串三种方式
- 【leetcode】JS 字典树 建树 查找键 查找键前缀【模板】
- 安装adobe系列应用出现Error无法继续安装,文件已损坏,无法打开解决方法
- C++ 火柴棍摆正方形
热门文章
- 删除文件夹时显示文件已在另一个程序中打开的解决方法
- 小熊错误_实测99.9%的女生都说好用的化妆镜-小熊化妆镜
- Identifying Singleton Spammers via Spammer Group Detection
- python卡尔曼滤波_[转]python起步之卡尔曼滤波
- HTML标记【图片的使用】!
- 从1到无穷大-强化学习篇
- 常数乘以无穷大等于多少_请教一个数学问题:无穷大乘以无穷小等于多少?
- 数据代码如何“产地直销”,做到持续集成持续发布?
- ava查询mysql的数据_【技术综述】AVA-第一个大规模的美学质量评估数据库
- 维特WT931——制作支持ROS的IMU惯性导航传感器