Hint: If you want to see a list of allocated tensors when OOM happens,

问题描述：

使用keras搭建siamese网络时，遇到错误如下：

OOM when allocating tensor with shape[129024,4096] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[Node: dense_1/kernel/Assign = Assign[T=DT_FLOAT, _class=["loc:@dense_1/kernel"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](dense_1/kernel, dense_1/kernel/cond/Merge)]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

经查阅资料认为是内存不足。故修改batch_size，修改原始数据集样本对的生成方式。均无效。

经过仔细思考认为应该不是内存不足（原始数据集600M），而我是在服务器（内存128g）上运行，且服务器上没有其他人在使用。在程序运行期间使用top指令观察了服务器的内存使用情况，free部分一直有100g以上。排除内存不足的情况。

考虑可能显存不足（关于内存和显存的具体区别和使用不甚了解，请大家不吝赐教）。仔细阅读了错误日志（以后不能只关注Traceback 部分，Caused by更重要！！！）

发现是由于全连接层的参数太多导致显存不够（[129024,4096] ），确实太多。设计网络的时候疏忽了。

tip: 关于OOM 导致的错误，最重要的是定位到导致OOM的那行代码！！！

具体可以通过仔细阅读错误日志、在程序中设置标记（可能的地方print标记一下）。某师兄由于动态数组分配问题导致OOM，最后通过设置标记解决。

重新修改网络结构后，模型在16g内存的机器上顺利运行。

深度学习中最直观的方式就是减小batch_size或者hidden_layer中的单元数
---------------------
作者：huowa9077
来源：CSDN
原文：https://blog.csdn.net/huowa9077/article/details/81042553
版权声明：本文为博主原创文章，转载请附上博文链接！

Hint: If you want to see a list of allocated tensors when OOM happens,相关推荐

解决‘Hint: If you want to see a list of allocated tensors when OOM happens, add ................‘
tensorflow-gpu运行ner模型的训练代码,出现报错如下图: 主要原因是显卡的显存不够. 解决方法: 1,降低训练的数据输入批次值大小:batch_size 2, 将模型放到gpu显存大的服 ...
报错解决：ResourceExhaustedError: OOM when allocating tensor with shape
报错解决:ResourceExhaustedError: OOM when allocating tensor with shape 早上在使用tensorflow时遇到如下报错: Traceback ...
使用tensorflow object detection API 训练自己的目标检测模型（三）
在上一篇博客"使用tensorflow object detection API 训练自己的目标检测模型 (二)"中介绍了如何使用LabelImg标记数据集,生成.xml文件,经过 ...
FakeAPP训练时错误【ResourceExhaustedError: OOM】解决方法
背景在使用FakeApp训练的时候出错,表现为点击"train"按钮后一段时间就报错,且没有训练预览窗口弹出. log文件里显示 undefinedUsing GPU0 for ...
TensorFlow精进之路（十六）：使用slim模型库对图片分类
1.概述 TF-slim是tensorflow的一个轻量级库,它将很多常见tensorflow函数进行封装,使的模型的构建.训练.测试都更加简洁,特别适用于构建结构复杂的深度神经网络.github地址 ...
tensorflow-gpu版本使用问题和方法汇总
tensorflow-gpu版本使用问题和方法汇总训练内存溢出问题问题描述解决方法 keras和tensorflow混合使用的条件下发生内存溢出的问题 batch_size设置导致tensorf ...
辅助神器——Kaggle
辅助神器--Kaggle 在学tensorflow的时候,发现训练模型会出现这个错误: tensorflow.python.framework.errors_impl.ResourceExhauste ...
Jetson TX2实现EfficientDet推理加速（二）
一.参考资料 TensorRT实现EfficientDet推理加速(一) 二.可能出现的问题 infer推理错误 [TensorRT] ERROR: 2: [pluginV2DynamicExtRun ...
NLP之BERT英文阅读理解问答SQuAD 2.0超详细教程
环境 linux python 3.6 tensorflow 1.12.0 文件准备工作下载bert源代码 : https://github.com/google-research/bert 下载b ...

Hint: If you want to see a list of allocated tensors when OOM happens,

Hint: If you want to see a list of allocated tensors when OOM happens,相关推荐

最新文章

热门文章