Dropout Reduces Underfitting

单位：Meta AI, UC伯克利, MBZUAI

论文：https://arxiv.org/abs/2303.01500

代码（刚刚开源）：

https://github.com/facebookresearch/dropout

日期: 2023年3月2日提交

当前阅读日期: 2023-03-08

论文主要贡献:

论文提供了两种改变标准dropout的方法：early dropout 和 late dropout 提高了过拟合与欠拟合模型的模型效果。

具体为:

early dropout ：在训练的开始一段时间进行dropout 在以后的训练中不使用，用于欠拟合模型更好的拟合
late dropout: 在训练的一开始不使用dropout ，在之后的训练使用dropout，来降低已经使用dropout模型的过拟合。

论文代码

early drop 与 late drop 的实现

# https://github.com/facebookresearch/dropout/blob/main/drop_scheduler.py
import numpy as npdef drop_scheduler(drop_rate, epochs, niter_per_ep, cutoff_epoch=0, mode="standard", schedule="constant"):'''drop_rate: drop_rate epochs: epochsniter_per_ep:num_training_steps_per_epochcutoff_epoch: 当mode 为 early 时表示从哪个epoch drop out 结束使用当mode 为 late 时表示从哪个epoch drop out 开始使用mode:           ["standard", "early", "late"]schedule: drop out 是线性的还是固定的["constant", "linear"]return : np.array length = epochs * niter_per_ep每一batch 的 dropout'''assert mode in ["standard", "early", "late"]if mode == "standard":return np.full(epochs * niter_per_ep, drop_rate)early_iters = cutoff_epoch * niter_per_eplate_iters = (epochs - cutoff_epoch) * niter_per_epif mode == "early":assert schedule in ["constant", "linear"]if schedule == 'constant':early_schedule = np.full(early_iters, drop_rate)elif schedule == 'linear':early_schedule = np.linspace(drop_rate, 0, early_iters)final_schedule = np.concatenate((early_schedule, np.full(late_iters, 0)))elif mode == "late":assert schedule in ["constant"]early_schedule = np.full(early_iters, 0)final_schedule = np.concatenate((early_schedule, np.full(late_iters, drop_rate)))assert len(final_schedule) == epochs * niter_per_epreturn final_schedule

early drop 与late drop 的使用

# 在声明模型时增加update_dropout 方法 如：
# https://github.com/facebookresearch/dropout/blob/main/models/vision_transformer.py
# ...
def update_dropout(self, drop_rate):self.drop_rate = drop_ratefor module in self.modules():if isinstance(module, nn.Dropout):module.p = drop_rate
# 使用drop_path (关于drop  path ：stochastic depth具体可见：https://github.com/huggingface/pytorch-image-models/blob/4b8cfa6c0a355a9b3cb2a77298b240213fb3b921/timm/layers/drop.py#L137
# https://github.com/facebookresearch/dropout/blob/main/models/vision_transformer.py
def update_drop_path(self, drop_path_rate):self.drop_path = drop_path_ratedp_rates=[x.item() for x in torch.linspace(0, drop_path_rate, self.depth)]for i in range(self.depth):self.blocks[i].drop_path.drop_prob = dp_rates[i]# 在训练时每个batch 更新一下 dropout
# https://github.com/facebookresearch/dropout/blob/main/engine.py#L114
model.module.update_dropout(schedules['do'][it])

论文效果

   详见论文与git

参考链接: https://mp.weixin.qq.com/s/TqdOoHMtbQxveNSGgRC6Rw
https://github.com/facebookresearch/dropout
https://stackoverflow.com/questions/69175642/droppath-in-timm-seems-like-a-dropout

【general】[drop out]论文笔记：Dropout Reduces Underfitting相关推荐

论文笔记 |【CVPR2021】Uformer: A General U-Shaped Transformer for Image Restoration
论文笔记 |[CVPR2021]Uformer: A General U-Shaped Transformer for Image Restoration 文章目录论文笔记 |[CVPR2021]U ...
论文笔记【A Comprehensive Study of Deep Video Action Recognition】
论文链接:A Comprehensive Study of Deep Video Action Recognition 目录 A Comprehensive Study of Deep Video A ...
论文笔记目录（ver2.0）
1 时间序列 1.1 时间序列预测论文名称来源主要内容论文笔记:DCRNN (Diffusion Convolutional Recurrent Neural Network: Data-Dr ...
论文笔记-Vanilla Transformer：Character-Level Language Modeling with Deeper Self-Attention
论文笔记-Vanilla Transformer:Character-Level Language Modeling with Deeper Self-Attention 1. 介绍 2. Chara ...
图像隐写术分析论文笔记：Deep learning for steganalysis via convolutional neural networks
好久没有写论文笔记了,这里开始一个新任务,即图像的steganalysis任务的深度网络模型.现在是论文阅读阶段,会陆续分享一些相关论文,以及基础知识,以及传统方法的思路,以资借鉴. 这一篇是Medi ...
论文笔记：Sequential Recommendation with Relation-Aware Kernelized Self-Attention
论文笔记:Sequential Recommendation with Relation-Aware Kernelized Self-Attention 摘要: 最近的研究发现,顺序推荐可以通 ...
论文笔记：WORD TRANSLATION WITHOUT PARALLEL DATA
引用文章 Facebook MUSE 无监督跨语言迁移学习任务 face - Word Translation without Parallel Data 文献阅读笔记:Word Translatio ...
论文笔记之Stein变分梯度下降
论文地址:点这里.作者还提供了Stein变分梯度下降法的源码. Note: 源码不涉及深度学习,所以PyTorch用户或者TF用户都可以使用. Stein变分梯度下降(SVGD)可以理解是一种和随机梯 ...
论文笔记——EXPLAINING AND HARNESSING ADVERSARIAL EXAMPLES（解释和利用对抗样本）
本文参考了三篇笔记,帮助很大: <Explaining and Harnessing Adversarial Examples>阅读笔记 [论文笔记]Explaining & Ha ...

【general】[drop out]论文笔记：Dropout Reduces Underfitting