【学习率】torch.optim.lr_scheduler学习率10种调整方法整理
学习率是网络训练过程中非常重要的参数,好的学习率可加速模型,并且避免局部最优解,这几天陷入了怪圈,被学习率折磨了,遂记录一下lr_scheduler中的学习率调整方法。
学习率调整在网络中的位置以及当前学习率查看方法
import torch
import torch.nn as nn
import torch.optimoptimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.9,
weight_decay=1e-5)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, 20, eta_min=0, last_epoch=-1)for epoch in range(100):now_lr = optimizer.state_dict()['param_groups'][0]['lr'] # 当前学习率查看train(...)test(...)scheduler.step() # 学习率更新
1. LambdaLR()
LambdaLR(optimizer, lr_lambda, last_epoch=-1, verbose=False)
(1)学习率调整为:初始lr乘以给定函数。
(2)通用参数:
[1] optimizer:优化器;
[2] last_epoch:默认为-1,学习率设置为初始值;
[3] verbose:默认为False,若为True,则每次学习率更新打印信息;
(3)参数:
[1] lr_lambda:给定函数;
import torch
import torch.nn as nn
import torch.optim
import matplotlib.pyplot as pltmodel = [nn.Parameter(torch.randn(4, 4, requires_grad=True))]
optimizer = torch.optim.SGD(model, lr=0.01, momentum=0.9, weight_decay=1e-5)
# -----------------------------------------------------------------------------------
lambda1 = lambda epoch: 0.9 ** epoch
scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda=lambda1)
# -----------------------------------------------------------------------------------
y = []
for epoch in range(100): lr = optimizer.state_dict()['param_groups'][0]['lr']y.append(lr)scheduler.step() plt.figure(dpi=300)
plt.plot(y)
plt.title('LambdaLR')
2. MultiplicativeLR()
MultiplicativeLR(optimizer, lr_lambda, last_epoch=-1, verbose=False)
(1)学习率调整为:当前lr乘以给定函数。
(2)参数:
[1] lr_lambda:给定函数;
model = [nn.Parameter(torch.randn(4, 4, requires_grad=True))]
optimizer = torch.optim.SGD(model, lr=0.01, momentum=0.9, weight_decay=1e-5)
# -----------------------------------------------------------------------------------
lambda1 = lambda epoch: 0.9 ** epoch
scheduler = torch.optim.lr_scheduler.MultiplicativeLR(optimizer, lr_lambda=lambda1)
# -----------------------------------------------------------------------------------
y = []
for epoch in range(100): lr = optimizer.state_dict()['param_groups'][0]['lr']y.append(lr)scheduler.step() plt.figure(dpi=300)
plt.plot(y)
plt.title('MultiplicativeLR')
3. StepLR()
StepLR(optimizer, step_size, gamma=0.1, last_epoch=-1, verbose=False)
(1)学习率调整为:以固定间隔成倍衰减初始学习率。
(2)参数:
[1] step_size:固定间隔;
[2] gamma:衰减倍数;
model = [nn.Parameter(torch.randn(4, 4, requires_grad=True))]
optimizer = torch.optim.SGD(model, lr=0.1, momentum=0.9, weight_decay=1e-5)
# -----------------------------------------------------------------------------------
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=20, gamma=0.1)
# -----------------------------------------------------------------------------------
y = []
for epoch in range(100): lr = optimizer.state_dict()['param_groups'][0]['lr']y.append(lr)scheduler.step() plt.figure(dpi=300)
plt.plot(y)
plt.title('StepLR')
4. MultiStepLR()
MultiStepLR(optimizer, milestones, gamma=0.1, last_epoch=-1, verbose=False)
(1)学习率调整为:以设置的间隔成倍衰减初始学习率。
(2)参数:
[1] milestones:设置间隔的索引,必须是递增的;
[2] gamma:衰减倍数;
model = [nn.Parameter(torch.randn(4, 4, requires_grad=True))]
optimizer = torch.optim.SGD(model, lr=0.1, momentum=0.9, weight_decay=1e-5)
# -----------------------------------------------------------------------------------
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[40, 70], gamma=0.5)
# -----------------------------------------------------------------------------------
y = []
for epoch in range(100): lr = optimizer.state_dict()['param_groups'][0]['lr']y.append(lr)scheduler.step() plt.figure(dpi=300)
plt.plot(y)
plt.title('MultiStepLR')
5. ExponentialLR()
ExponentialLR(optimizer, gamma, last_epoch=-1, verbose=False)
(1)学习率调整为:指数衰减初始学习率。(lr = lr * gamma**epoch)
(2)参数:
[1] gamma:指数衰减的底数;
model = [nn.Parameter(torch.randn(4, 4, requires_grad=True))]
optimizer = torch.optim.SGD(model, lr=0.1, momentum=0.9, weight_decay=1e-5)
# -----------------------------------------------------------------------------------
scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma=0.5)
# -----------------------------------------------------------------------------------
y = []
for epoch in range(100): lr = optimizer.state_dict()['param_groups'][0]['lr']y.append(lr)scheduler.step() plt.figure(dpi=300)
plt.plot(y)
plt.title('ExponentialLR')
6. CosineAnnealingLR()
CosineAnnealingLR(optimizer, T_max, eta_min=0, last_epoch=-1, verbose=False)
(1)学习率调整为:余弦函数式变化,最高点为初始lr,最低点为eta_min,半周期为T_max。
(2)参数:
[1] T_max:余弦函数学习率调整的半周期;
[2] eta_min:默认为0,最小学习率;
model = [nn.Parameter(torch.randn(4, 4, requires_grad=True))]
optimizer = torch.optim.SGD(model, lr=0.1, momentum=0.9, weight_decay=1e-5)
# -----------------------------------------------------------------------------------
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=20)
# -----------------------------------------------------------------------------------
y = []
for epoch in range(100): lr = optimizer.state_dict()['param_groups'][0]['lr']y.append(lr)scheduler.step() plt.figure(dpi=300)
plt.plot(y)
plt.title('CosineAnnealingLR')
7. ReduceLROnPlateau()
ReduceLROnPlateau(optimizer, mode=‘min’, factor=0.1, patience=10, threshold=1e-4, threshold_mode=‘rel’, cooldown=0, min_lr=0, eps=1e-8, verbose=False)
(1)学习率调整为:当某一指标不再提升时,降低学习率。
(2)参数:
[1] mode:模式选择,min为指标不再降低,如loss,max为指标不再升高,如准确率;
[2] factor:学习率衰减因子:new_lr = lr * factor;
[3] patience:容忍度,最多可以多少个epoch没有学习率变化;
[4] threshold:阈值,用来关注显著变化;
[5] threshold_mode:阈值模式,有rel和abs两种;
若为rel,
mode=max,则dynamic_threshold=best * ( 1 + threshold )
mode=min,则dynamic_threshold=best * ( 1 - threshold )
若为abs,
mode=max,则dynamic_threshold = best + threshold
mode=min,则dynamic_threshold=best - threshold
[6] cooldown:冷却时间,当调整学习率之后,等多少个epoch,再重启监测模式;
[7] min_lr:学习率下限,可为float或list,当有多个参数组时,可用list进行设置;
[8] eps:学习率最小衰减值,当学习率变化小于eps时,则不调整学习率;
model = [nn.Parameter(torch.randn(4, 4, requires_grad=True))]
optimizer = torch.optim.SGD(model, lr=0.1, momentum=0.9, weight_decay=1e-5)
# -----------------------------------------------------------------------------------
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer)
# -----------------------------------------------------------------------------------
y = []
for epoch in range(100): lr = optimizer.state_dict()['param_groups'][0]['lr']val_loss = 0.9 y.append(lr)scheduler.step(val_loss) plt.figure(dpi=300)
plt.plot(y)
plt.title('ReduceLROnPlateau')
8. CyclicLR()
CyclicLR(optimizer, base_lr, max_lr, step_size_up=2000, step_size_down=None, mode=‘triangular’, gamma=1., scale_fn=None, scale_mode=‘cycle’, cycle_momentum=True, base_momentum=0.8, max_momentum=0.9, last_epoch=-1, verbose=False)
(1)学习率调整为:周期调整学习率,按iteration周期调整而不是epoch,iteration为在一个epoch中,根据batch_size的大小,学习率会更新的次数。
(2)参数:
[1] base_lr:初始学习率,学习率调整的下界;
[2] max_lr:每个周期中,学习率调整的上界,即周期振幅为(max_lr - base_lr);
[3] step_size_up:在递增的半周期内的训练迭代次数;
[4] step_size_down:在递减的半周期内的训练迭代次数;
[5] mode:模式,有{triangle, triangular2, exp_range}三种;
triangle:没有振幅缩放的基本三角形循环
triangular2:一个基本的三角形周期,每个周期将初始振幅降低一半
exp_range:在每次循环迭代时将初始振幅降低:gamma**(循环迭代)
[6] gamma:exp_range模式中缩放函数中的常量;
[7] scale_fn:由单个参数lambda函数定义的自定义缩放策略,其中0 <= scale_fn(x) <= 1 for all x >= 0。如果指定,则’mode’将被忽略;
[8] scale_mode:有cycle和iterations两种,定义scale_fn是按cycle计算还是按cycle iterations计算;
[9] cycle_momentum:若为True:动量与学习速率成反比在’base_momentum’ 和 ‘max_momentum’ 之间循环;
[10] base_momentum:动量下边界;
[11] max_momentum:动量上边界;
model = [nn.Parameter(torch.randn(4, 4, requires_grad=True))]
optimizer = torch.optim.SGD(model, lr=0.1, momentum=0.9, weight_decay=1e-5)
# -----------------------------------------------------------------------------------
scheduler = torch.optim.lr_scheduler.CyclicLR(optimizer, base_lr=0.001, max_lr=0.1, step_size_up=5, step_size_down=15)
# -----------------------------------------------------------------------------------
y = []
for epoch in range(5): for batch in range(20):lr = optimizer.state_dict()['param_groups'][0]['lr']y.append(lr)scheduler.step() plt.figure(dpi=300)
plt.plot(y)
plt.title('CyclicLR-triangular')
model = [nn.Parameter(torch.randn(4, 4, requires_grad=True))]
optimizer = torch.optim.SGD(model, lr=0.1, momentum=0.9, weight_decay=1e-5)
# -----------------------------------------------------------------------------------
scheduler = torch.optim.lr_scheduler.CyclicLR(optimizer, base_lr=0.001, max_lr=0.1,mode = 'triangular2',step_size_up=5, step_size_down=15)
# -----------------------------------------------------------------------------------
y = []
for epoch in range(5): for batch in range(20):lr = optimizer.state_dict()['param_groups'][0]['lr']y.append(lr)scheduler.step() plt.figure(dpi=300)
plt.plot(y)
plt.title('CyclicLR-triangular2')
model = [nn.Parameter(torch.randn(4, 4, requires_grad=True))]
optimizer = torch.optim.SGD(model, lr=0.1, momentum=0.9, weight_decay=1e-5)
# -----------------------------------------------------------------------------------
scheduler = torch.optim.lr_scheduler.CyclicLR(optimizer, base_lr=0.001, max_lr=0.1,mode = 'exp_range', gamma = 0.95,step_size_up=5, step_size_down=15)
# -----------------------------------------------------------------------------------
y = []
for epoch in range(5): for batch in range(20):lr = optimizer.state_dict()['param_groups'][0]['lr']y.append(lr)scheduler.step() plt.figure(dpi=300)
plt.plot(y)
plt.title('CyclicLR-exp_range')
9. CosineAnnealingWarmRestarts()
CosineAnnealingWarmRestarts(optimizer, T_0, T_mult=1, eta_min=0, last_epoch=-1, verbose=False)
(1)学习率调整为:余弦退火方式。
(2)参数:
[1] T_0:第一次restart时epoch的数值;
[2] T_mult:每次restart后,学习率restart周期增加因子;
[3] eta_min:最小的学习率;
model = [nn.Parameter(torch.randn(4, 4, requires_grad=True))]
optimizer = torch.optim.SGD(model, lr=0.1, momentum=0.9, weight_decay=1e-5)
# -----------------------------------------------------------------------------------
scheduler = torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(optimizer, T_0=15, T_mult=2)
# -----------------------------------------------------------------------------------
y = []
for epoch in range(100): lr = optimizer.state_dict()['param_groups'][0]['lr']y.append(lr)scheduler.step() plt.figure(dpi=300)
plt.plot(y)
plt.title('CosineAnnealingWarmRestarts')
10. OneCycleLR()
OneCycleLR(optimizer, max_lr, total_steps=None, epochs=None, steps_per_epoch=None, pct_start=0.3, anneal_strategy=‘cos’, cycle_momentum=True, base_momentum=0.85, max_momentum=0.95, div_factor=25., final_div_factor=1e4, three_phase=False, last_epoch=-1, verbose=False)
(1)学习率调整为:根据“1cycle”策略,设置各参数组的学习率。1cycle策略将学习率从初始学习率退火到最大学习率,然后从最大学习率退火到远低于初始学习率的最小学习率。
(2)参数:
[1] max_lr:周期中学习率上界;
[2] total_steps:周期中的总步数;
[3] epochs:训练epoch数;
[4] steps_per_epoch:每个epoch中需要的训练步数;
[5] pct_start:周期中学习率上升占比;
[6] anneal_strategy:退火策略,可为cos和linear;
[7] cycle_momentum:若为True:动量与学习速率成反比 在’base_momentum’ 和 ‘max_momentum’ 之间循环
[8] base_momentum:动量下边界;
[9] max_momentum:动量上边界;
[10] div_factor:通过initial_lr = max_lr/div_factor确定初始学习率;
[11] final_div_factor:通过min_lr = initial_lr/final_div_factor确定最小学习率;
[12] three_phase:若为True,则使用第三阶段根据’final_div_factor’来消除学习率,而不是修改第二阶段;
model = [nn.Parameter(torch.randn(4, 4, requires_grad=True))]
optimizer = torch.optim.SGD(model, lr=0.1, momentum=0.9, weight_decay=1e-5)
# -----------------------------------------------------------------------------------
scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr=0.1, pct_start=0.25, steps_per_epoch=20, epochs=5)
# -----------------------------------------------------------------------------------
y = []
for epoch in range(5): for batch in range(20): lr = optimizer.state_dict()['param_groups'][0]['lr']y.append(lr)scheduler.step() plt.figure(dpi=300)
plt.plot(y)
plt.title('OneCycleLR')
model = [nn.Parameter(torch.randn(4, 4, requires_grad=True))]
optimizer = torch.optim.SGD(model, lr=0.1, momentum=0.9, weight_decay=1e-5)
# -----------------------------------------------------------------------------------
scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr=0.1, pct_start=0.25, steps_per_epoch=20, epochs=5, three_phase=True)
# -----------------------------------------------------------------------------------
y = []
for epoch in range(5): for batch in range(20): lr = optimizer.state_dict()['param_groups'][0]['lr']y.append(lr)scheduler.step() plt.figure(dpi=300)
plt.plot(y)
plt.title('OneCycleLR-three_phase')
【学习率】torch.optim.lr_scheduler学习率10种调整方法整理相关推荐
- pytorch中调整学习率: torch.optim.lr_scheduler
文章翻译自:https://pytorch.org/docs/stable/optim.html torch.optim.lr_scheduler 中提供了基于多种epoch数目调整学习率的方法. t ...
- Pytorch(0)降低学习率torch.optim.lr_scheduler.ReduceLROnPlateau类
当网络的评价指标不在提升的时候,可以通过降低网络的学习率来提高网络性能.所使用的类 class torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer ...
- torch.optim.lr_scheduler.LambdaLR与OneCycleLR
目录 LambdaLR 输出 OneCycleLR 输出 LambdaLR 函数接口: LambdaLR(optimizer, lr_lambda, last_epoch=-1, verbose=Fa ...
- class torch.optim.lr_scheduler.ExponentialLR
参考链接: class torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma, last_epoch=-1, verbose=False) 配 ...
- class torch.optim.lr_scheduler.StepLR
参考链接: class torch.optim.lr_scheduler.StepLR(optimizer, step_size, gamma=0.1, last_epoch=-1, verbose= ...
- class torch.optim.lr_scheduler.LambdaLR
参考链接: class torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda, last_epoch=-1, verbose=False) 配套 ...
- ImportError: cannot import name ‘SAVE_STATE_WARNING‘ from ‘torch.optim.lr_scheduler‘ (/home/jsj/anac
from transformers import BertModel 报错 ImportError: cannot import name 'SAVE_STATE_WARNING' from 't ...
- win10系统没声音 服务器,win10电脑突然没有声音的10种修复方法
win10电脑突然没有声音的10种修复方法 发布时间:2020-08-07 05:18:26 来源:ITPUB博客 阅读:140 作者:XINQUDAO 不少用户在面对win10电脑突然没有声音的问题 ...
- 下班之后的10种放松方法
在经历了一天激烈的打拼后,不少职场中人会将工作场所的紧张情绪带回家中,回到家中仍然无法放松.如果发生这种情况,试试以下10种调节方法,它们能够帮助你从办公状态调整到居家状态: 1.将工作留在办公室 下 ...
- 电脑无法启动故障的10种解决方法
电脑无法启动故障的10种解决方法 开机自检时出现问题后会出现各种各样的英文短句,短句中包含了非常重要的信息,读懂这些信息可以自己解决一些小问题,可是这些英文难倒了一部分朋友,下面是一些常见的BIOS短 ...
最新文章
- EasyNVR、EasyDSS二次开发之:RTMP、HLS流在web页面进行无插件播放示例Demo代码
- C++学习之路 | PTA乙级—— 1086 就不告诉你 (15 分)(精简)
- 可控硅失效现象_可控硅坏的原因有哪些
- 服务器系统怎么用备份启动,如何用veeam给windows服务器做备份?
- mysql sphfiks_使用sphinx索引mysql数据
- plsql 常用函数
- 数据分析学习笔记—python简单操作EXCEL
- 如何时刻保持在目标的正确轨道上
- 读书笔记:Sheldon.M.Ross:概率论基础教程:2014.01.22
- Redis进阶: 锁的使用
- VM player免费版安装
- 年底无心工作?给个摸鱼好去处。中国超级英雄【一方净土】,进来看看嘛
- 将iGoogle-Style新标签页添加到Chrome
- 8种编程语言毕业设计参考文献大全(java,jsp,mysql,c#,asp.net,sqlserver,andorid,php)
- 原生JS获取元素在文档中的位置
- SSL证书错误了怎么办?
- ctfshow 密码挑战(上)
- 纺织ERP_印染ERP软件_指点ERP简介
- 信号类型(雷达)——雷达波形认识(一)
- 智能双向嵌入式UART转CAN模块介绍
热门文章
- 免费从5sing上下载歌曲
- STM32硬件编程_学习思路
- 最简单易懂的ios p12证书 和描述文件的创建,IPA上传,最完整的ios上架苹果商店教程
- HTML页面布局适配不同分辨率
- 关查找我的iphone时显示服务器连接超时,iPhone 屏幕镜像无法关闭,一直显示“正在查找 Apple TV”怎么办?...
- WIN10专业版激活后变成教育版怎么解决
- layui之table操作点击编辑,使用layer.open回显值
- 汇编语言:协处理器浮点指令:FILD
- vim:the damn garbled of vim-devicons from nerdtree
- Excel表的标题栏锁定