学习率是网络训练过程中非常重要的参数,好的学习率可加速模型,并且避免局部最优解,这几天陷入了怪圈,被学习率折磨了,遂记录一下lr_scheduler中的学习率调整方法。

学习率调整在网络中的位置以及当前学习率查看方法

import torch
import torch.nn as nn
import torch.optimoptimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.9,
weight_decay=1e-5)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, 20, eta_min=0, last_epoch=-1)for epoch in range(100):now_lr = optimizer.state_dict()['param_groups'][0]['lr']   # 当前学习率查看train(...)test(...)scheduler.step()  # 学习率更新

1. LambdaLR()

  LambdaLR(optimizer, lr_lambda, last_epoch=-1, verbose=False)

(1)学习率调整为:初始lr乘以给定函数。
(2)通用参数:
  [1] optimizer:优化器;
  [2] last_epoch:默认为-1,学习率设置为初始值;
  [3] verbose:默认为False,若为True,则每次学习率更新打印信息;
(3)参数:
  [1] lr_lambda:给定函数;

import torch
import torch.nn as nn
import torch.optim
import matplotlib.pyplot as pltmodel = [nn.Parameter(torch.randn(4, 4, requires_grad=True))]
optimizer = torch.optim.SGD(model, lr=0.01, momentum=0.9, weight_decay=1e-5)
# -----------------------------------------------------------------------------------
lambda1 = lambda epoch: 0.9 ** epoch
scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda=lambda1)
# -----------------------------------------------------------------------------------
y = []
for epoch in range(100):                                      lr = optimizer.state_dict()['param_groups'][0]['lr']y.append(lr)scheduler.step() plt.figure(dpi=300)
plt.plot(y)
plt.title('LambdaLR')

2. MultiplicativeLR()

  MultiplicativeLR(optimizer, lr_lambda, last_epoch=-1, verbose=False)

(1)学习率调整为:当前lr乘以给定函数。
(2)参数:
  [1] lr_lambda:给定函数;

model = [nn.Parameter(torch.randn(4, 4, requires_grad=True))]
optimizer = torch.optim.SGD(model, lr=0.01, momentum=0.9, weight_decay=1e-5)
# -----------------------------------------------------------------------------------
lambda1 = lambda epoch: 0.9 ** epoch
scheduler = torch.optim.lr_scheduler.MultiplicativeLR(optimizer, lr_lambda=lambda1)
# -----------------------------------------------------------------------------------
y = []
for epoch in range(100):                                      lr = optimizer.state_dict()['param_groups'][0]['lr']y.append(lr)scheduler.step() plt.figure(dpi=300)
plt.plot(y)
plt.title('MultiplicativeLR')

3. StepLR()

  StepLR(optimizer, step_size, gamma=0.1, last_epoch=-1, verbose=False)

(1)学习率调整为:以固定间隔成倍衰减初始学习率。
(2)参数:
  [1] step_size:固定间隔;
  [2] gamma:衰减倍数;

model = [nn.Parameter(torch.randn(4, 4, requires_grad=True))]
optimizer = torch.optim.SGD(model, lr=0.1, momentum=0.9, weight_decay=1e-5)
# -----------------------------------------------------------------------------------
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=20, gamma=0.1)
# -----------------------------------------------------------------------------------
y = []
for epoch in range(100):                                      lr = optimizer.state_dict()['param_groups'][0]['lr']y.append(lr)scheduler.step() plt.figure(dpi=300)
plt.plot(y)
plt.title('StepLR')

4. MultiStepLR()

  MultiStepLR(optimizer, milestones, gamma=0.1, last_epoch=-1, verbose=False)

(1)学习率调整为:以设置的间隔成倍衰减初始学习率。
(2)参数:
  [1] milestones:设置间隔的索引,必须是递增的;
  [2] gamma:衰减倍数;

model = [nn.Parameter(torch.randn(4, 4, requires_grad=True))]
optimizer = torch.optim.SGD(model, lr=0.1, momentum=0.9, weight_decay=1e-5)
# -----------------------------------------------------------------------------------
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[40, 70], gamma=0.5)
# -----------------------------------------------------------------------------------
y = []
for epoch in range(100):                                      lr = optimizer.state_dict()['param_groups'][0]['lr']y.append(lr)scheduler.step() plt.figure(dpi=300)
plt.plot(y)
plt.title('MultiStepLR')

5. ExponentialLR()

  ExponentialLR(optimizer, gamma, last_epoch=-1, verbose=False)

(1)学习率调整为:指数衰减初始学习率。(lr = lr * gamma**epoch)
(2)参数:
  [1] gamma:指数衰减的底数;

model = [nn.Parameter(torch.randn(4, 4, requires_grad=True))]
optimizer = torch.optim.SGD(model, lr=0.1, momentum=0.9, weight_decay=1e-5)
# -----------------------------------------------------------------------------------
scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma=0.5)
# -----------------------------------------------------------------------------------
y = []
for epoch in range(100):                                      lr = optimizer.state_dict()['param_groups'][0]['lr']y.append(lr)scheduler.step() plt.figure(dpi=300)
plt.plot(y)
plt.title('ExponentialLR')

6. CosineAnnealingLR()

  CosineAnnealingLR(optimizer, T_max, eta_min=0, last_epoch=-1, verbose=False)

(1)学习率调整为:余弦函数式变化,最高点为初始lr,最低点为eta_min,半周期为T_max。
(2)参数:
  [1] T_max:余弦函数学习率调整的半周期;
  [2] eta_min:默认为0,最小学习率;

model = [nn.Parameter(torch.randn(4, 4, requires_grad=True))]
optimizer = torch.optim.SGD(model, lr=0.1, momentum=0.9, weight_decay=1e-5)
# -----------------------------------------------------------------------------------
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=20)
# -----------------------------------------------------------------------------------
y = []
for epoch in range(100):                                      lr = optimizer.state_dict()['param_groups'][0]['lr']y.append(lr)scheduler.step() plt.figure(dpi=300)
plt.plot(y)
plt.title('CosineAnnealingLR')

7. ReduceLROnPlateau()

  ReduceLROnPlateau(optimizer, mode=‘min’, factor=0.1, patience=10, threshold=1e-4, threshold_mode=‘rel’, cooldown=0, min_lr=0, eps=1e-8, verbose=False)

(1)学习率调整为:当某一指标不再提升时,降低学习率。
(2)参数:
  [1] mode:模式选择,min为指标不再降低,如loss,max为指标不再升高,如准确率;
  [2] factor:学习率衰减因子:new_lr = lr * factor;
  [3] patience:容忍度,最多可以多少个epoch没有学习率变化;
  [4] threshold:阈值,用来关注显著变化;
  [5] threshold_mode:阈值模式,有rel和abs两种;
    若为rel,
    mode=max,则dynamic_threshold=best * ( 1 + threshold )
    mode=min,则dynamic_threshold=best * ( 1 - threshold )
    若为abs,
    mode=max,则dynamic_threshold = best + threshold
    mode=min,则dynamic_threshold=best - threshold
  [6] cooldown:冷却时间,当调整学习率之后,等多少个epoch,再重启监测模式;
  [7] min_lr:学习率下限,可为float或list,当有多个参数组时,可用list进行设置;
  [8] eps:学习率最小衰减值,当学习率变化小于eps时,则不调整学习率;

model = [nn.Parameter(torch.randn(4, 4, requires_grad=True))]
optimizer = torch.optim.SGD(model, lr=0.1, momentum=0.9, weight_decay=1e-5)
# -----------------------------------------------------------------------------------
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer)
# -----------------------------------------------------------------------------------
y = []
for epoch in range(100):                                  lr = optimizer.state_dict()['param_groups'][0]['lr']val_loss = 0.9 y.append(lr)scheduler.step(val_loss) plt.figure(dpi=300)
plt.plot(y)
plt.title('ReduceLROnPlateau')

8. CyclicLR()

  CyclicLR(optimizer, base_lr, max_lr, step_size_up=2000, step_size_down=None, mode=‘triangular’, gamma=1., scale_fn=None, scale_mode=‘cycle’, cycle_momentum=True, base_momentum=0.8, max_momentum=0.9, last_epoch=-1, verbose=False)

(1)学习率调整为:周期调整学习率,按iteration周期调整而不是epoch,iteration为在一个epoch中,根据batch_size的大小,学习率会更新的次数。
(2)参数:
  [1] base_lr:初始学习率,学习率调整的下界;
  [2] max_lr:每个周期中,学习率调整的上界,即周期振幅为(max_lr - base_lr);
  [3] step_size_up:在递增的半周期内的训练迭代次数;
  [4] step_size_down:在递减的半周期内的训练迭代次数;
  [5] mode:模式,有{triangle, triangular2, exp_range}三种;
    triangle:没有振幅缩放的基本三角形循环
    triangular2:一个基本的三角形周期,每个周期将初始振幅降低一半
    exp_range:在每次循环迭代时将初始振幅降低:gamma**(循环迭代)
  [6] gamma:exp_range模式中缩放函数中的常量;
  [7] scale_fn:由单个参数lambda函数定义的自定义缩放策略,其中0 <= scale_fn(x) <= 1 for all x >= 0。如果指定,则’mode’将被忽略;
  [8] scale_mode:有cycle和iterations两种,定义scale_fn是按cycle计算还是按cycle iterations计算;
  [9] cycle_momentum:若为True:动量与学习速率成反比在’base_momentum’ 和 ‘max_momentum’ 之间循环;
  [10] base_momentum:动量下边界;
  [11] max_momentum:动量上边界;

model = [nn.Parameter(torch.randn(4, 4, requires_grad=True))]
optimizer = torch.optim.SGD(model, lr=0.1, momentum=0.9, weight_decay=1e-5)
# -----------------------------------------------------------------------------------
scheduler = torch.optim.lr_scheduler.CyclicLR(optimizer, base_lr=0.001, max_lr=0.1, step_size_up=5, step_size_down=15)
# -----------------------------------------------------------------------------------
y = []
for epoch in range(5):                                  for batch in range(20):lr = optimizer.state_dict()['param_groups'][0]['lr']y.append(lr)scheduler.step() plt.figure(dpi=300)
plt.plot(y)
plt.title('CyclicLR-triangular')

model = [nn.Parameter(torch.randn(4, 4, requires_grad=True))]
optimizer = torch.optim.SGD(model, lr=0.1, momentum=0.9, weight_decay=1e-5)
# -----------------------------------------------------------------------------------
scheduler = torch.optim.lr_scheduler.CyclicLR(optimizer, base_lr=0.001, max_lr=0.1,mode = 'triangular2',step_size_up=5, step_size_down=15)
# -----------------------------------------------------------------------------------
y = []
for epoch in range(5):                                  for batch in range(20):lr = optimizer.state_dict()['param_groups'][0]['lr']y.append(lr)scheduler.step() plt.figure(dpi=300)
plt.plot(y)
plt.title('CyclicLR-triangular2')

model = [nn.Parameter(torch.randn(4, 4, requires_grad=True))]
optimizer = torch.optim.SGD(model, lr=0.1, momentum=0.9, weight_decay=1e-5)
# -----------------------------------------------------------------------------------
scheduler = torch.optim.lr_scheduler.CyclicLR(optimizer, base_lr=0.001, max_lr=0.1,mode = 'exp_range', gamma = 0.95,step_size_up=5, step_size_down=15)
# -----------------------------------------------------------------------------------
y = []
for epoch in range(5):                                  for batch in range(20):lr = optimizer.state_dict()['param_groups'][0]['lr']y.append(lr)scheduler.step() plt.figure(dpi=300)
plt.plot(y)
plt.title('CyclicLR-exp_range')

9. CosineAnnealingWarmRestarts()

  CosineAnnealingWarmRestarts(optimizer, T_0, T_mult=1, eta_min=0, last_epoch=-1, verbose=False)

(1)学习率调整为:余弦退火方式。
(2)参数:
  [1] T_0:第一次restart时epoch的数值;
  [2] T_mult:每次restart后,学习率restart周期增加因子;
  [3] eta_min:最小的学习率;

model = [nn.Parameter(torch.randn(4, 4, requires_grad=True))]
optimizer = torch.optim.SGD(model, lr=0.1, momentum=0.9, weight_decay=1e-5)
# -----------------------------------------------------------------------------------
scheduler = torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(optimizer, T_0=15, T_mult=2)
# -----------------------------------------------------------------------------------
y = []
for epoch in range(100):                                  lr = optimizer.state_dict()['param_groups'][0]['lr']y.append(lr)scheduler.step() plt.figure(dpi=300)
plt.plot(y)
plt.title('CosineAnnealingWarmRestarts')

10. OneCycleLR()

  OneCycleLR(optimizer, max_lr, total_steps=None, epochs=None, steps_per_epoch=None, pct_start=0.3, anneal_strategy=‘cos’, cycle_momentum=True, base_momentum=0.85, max_momentum=0.95, div_factor=25., final_div_factor=1e4, three_phase=False, last_epoch=-1, verbose=False)

(1)学习率调整为:根据“1cycle”策略,设置各参数组的学习率。1cycle策略将学习率从初始学习率退火到最大学习率,然后从最大学习率退火到远低于初始学习率的最小学习率。
(2)参数:
  [1] max_lr:周期中学习率上界;
  [2] total_steps:周期中的总步数;
  [3] epochs:训练epoch数;
  [4] steps_per_epoch:每个epoch中需要的训练步数;
  [5] pct_start:周期中学习率上升占比;
  [6] anneal_strategy:退火策略,可为cos和linear;
  [7] cycle_momentum:若为True:动量与学习速率成反比 在’base_momentum’ 和 ‘max_momentum’ 之间循环
  [8] base_momentum:动量下边界;
  [9] max_momentum:动量上边界;
  [10] div_factor:通过initial_lr = max_lr/div_factor确定初始学习率;
  [11] final_div_factor:通过min_lr = initial_lr/final_div_factor确定最小学习率;
  [12] three_phase:若为True,则使用第三阶段根据’final_div_factor’来消除学习率,而不是修改第二阶段;

model = [nn.Parameter(torch.randn(4, 4, requires_grad=True))]
optimizer = torch.optim.SGD(model, lr=0.1, momentum=0.9, weight_decay=1e-5)
# -----------------------------------------------------------------------------------
scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr=0.1, pct_start=0.25, steps_per_epoch=20, epochs=5)
# -----------------------------------------------------------------------------------
y = []
for epoch in range(5): for batch in range(20):                              lr = optimizer.state_dict()['param_groups'][0]['lr']y.append(lr)scheduler.step() plt.figure(dpi=300)
plt.plot(y)
plt.title('OneCycleLR')

model = [nn.Parameter(torch.randn(4, 4, requires_grad=True))]
optimizer = torch.optim.SGD(model, lr=0.1, momentum=0.9, weight_decay=1e-5)
# -----------------------------------------------------------------------------------
scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr=0.1, pct_start=0.25, steps_per_epoch=20, epochs=5, three_phase=True)
# -----------------------------------------------------------------------------------
y = []
for epoch in range(5): for batch in range(20):                              lr = optimizer.state_dict()['param_groups'][0]['lr']y.append(lr)scheduler.step() plt.figure(dpi=300)
plt.plot(y)
plt.title('OneCycleLR-three_phase')

【学习率】torch.optim.lr_scheduler学习率10种调整方法整理相关推荐

  1. pytorch中调整学习率: torch.optim.lr_scheduler

    文章翻译自:https://pytorch.org/docs/stable/optim.html torch.optim.lr_scheduler 中提供了基于多种epoch数目调整学习率的方法. t ...

  2. Pytorch(0)降低学习率torch.optim.lr_scheduler.ReduceLROnPlateau类

    当网络的评价指标不在提升的时候,可以通过降低网络的学习率来提高网络性能.所使用的类 class torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer ...

  3. torch.optim.lr_scheduler.LambdaLR与OneCycleLR

    目录 LambdaLR 输出 OneCycleLR 输出 LambdaLR 函数接口: LambdaLR(optimizer, lr_lambda, last_epoch=-1, verbose=Fa ...

  4. class torch.optim.lr_scheduler.ExponentialLR

    参考链接: class torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma, last_epoch=-1, verbose=False) 配 ...

  5. class torch.optim.lr_scheduler.StepLR

    参考链接: class torch.optim.lr_scheduler.StepLR(optimizer, step_size, gamma=0.1, last_epoch=-1, verbose= ...

  6. class torch.optim.lr_scheduler.LambdaLR

    参考链接: class torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda, last_epoch=-1, verbose=False) 配套 ...

  7. ImportError: cannot import name ‘SAVE_STATE_WARNING‘ from ‘torch.optim.lr_scheduler‘ (/home/jsj/anac

    from transformers import BertModel 报错   ImportError: cannot import name 'SAVE_STATE_WARNING' from 't ...

  8. win10系统没声音 服务器,win10电脑突然没有声音的10种修复方法

    win10电脑突然没有声音的10种修复方法 发布时间:2020-08-07 05:18:26 来源:ITPUB博客 阅读:140 作者:XINQUDAO 不少用户在面对win10电脑突然没有声音的问题 ...

  9. 下班之后的10种放松方法

    在经历了一天激烈的打拼后,不少职场中人会将工作场所的紧张情绪带回家中,回到家中仍然无法放松.如果发生这种情况,试试以下10种调节方法,它们能够帮助你从办公状态调整到居家状态: 1.将工作留在办公室 下 ...

  10. 电脑无法启动故障的10种解决方法

    电脑无法启动故障的10种解决方法 开机自检时出现问题后会出现各种各样的英文短句,短句中包含了非常重要的信息,读懂这些信息可以自己解决一些小问题,可是这些英文难倒了一部分朋友,下面是一些常见的BIOS短 ...

最新文章

  1. EasyNVR、EasyDSS二次开发之:RTMP、HLS流在web页面进行无插件播放示例Demo代码
  2. C++学习之路 | PTA乙级—— 1086 就不告诉你 (15 分)(精简)
  3. 可控硅失效现象_可控硅坏的原因有哪些
  4. 服务器系统怎么用备份启动,如何用veeam给windows服务器做备份?
  5. mysql sphfiks_使用sphinx索引mysql数据
  6. plsql 常用函数
  7. 数据分析学习笔记—python简单操作EXCEL
  8. 如何时刻保持在目标的正确轨道上
  9. 读书笔记:Sheldon.M.Ross:概率论基础教程:2014.01.22
  10. Redis进阶: 锁的使用
  11. VM player免费版安装
  12. 年底无心工作?给个摸鱼好去处。中国超级英雄【一方净土】,进来看看嘛
  13. 将iGoogle-Style新标签页添加到Chrome
  14. 8种编程语言毕业设计参考文献大全(java,jsp,mysql,c#,asp.net,sqlserver,andorid,php)
  15. 原生JS获取元素在文档中的位置
  16. SSL证书错误了怎么办?
  17. ctfshow 密码挑战(上)
  18. 纺织ERP_印染ERP软件_指点ERP简介
  19. 信号类型(雷达)——雷达波形认识(一)
  20. 智能双向嵌入式UART转CAN模块介绍

热门文章

  1. 免费从5sing上下载歌曲
  2. STM32硬件编程_学习思路
  3. 最简单易懂的ios p12证书 和描述文件的创建,IPA上传,最完整的ios上架苹果商店教程
  4. HTML页面布局适配不同分辨率
  5. 关查找我的iphone时显示服务器连接超时,iPhone 屏幕镜像无法关闭,一直显示“正在查找 Apple TV”怎么办?...
  6. WIN10专业版激活后变成教育版怎么解决
  7. layui之table操作点击编辑,使用layer.open回显值
  8. 汇编语言:协处理器浮点指令:FILD
  9. vim:the damn garbled of vim-devicons from nerdtree
  10. Excel表的标题栏锁定