Tensorflow Out of memory and CPU/GPU usage
我将Tensorflow与Keras结合使用来训练用于对象识别(YOLO)的神经网络。
我编写了模型,并尝试使用keras model.fit_generator()批处理32张416x416x3图像。
我正在使用具有8GB内存的NVIDIA GEFORCE RTX 2070 GPU(Tensorflow使用约6.6 GB)。
但是,当我开始训练模型时,会收到如下消息:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape 2019-02-11 16:13:08.051289: W tensorflow/core/common_runtime/bfc_allocator.cc:267] Allocator (GPU_0_bfc) ran out of memory trying to allocate 338.00MiB. Current allocation summary follows. 2019-02-11 16:13:08.057318: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (256): Total Chunks: 1589, Chunks in use: 1589. 397.3KiB allocated for chunks. 397.3KiB in use in bin. 25.2KiB client-requested in use in bin. 2019-02-11 16:13:08.061222: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (512): Total Chunks: 204, Chunks in use: 204. 102.0KiB allocated for chunks. 102.0KiB in use in bin. 100.1KiB client-requested in use in bin. ... 2019-02-11 16:13:08.142674: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (268435456): Total Chunks: 11, Chunks in use: 11. 5.05GiB allocated for chunks. 5.05GiB in use in bin. 4.95GiB client-requested in use in bin. 2019-02-11 16:13:08.148149: I tensorflow/core/common_runtime/bfc_allocator.cc:613] Bin for 338.00MiB was 256.00MiB, Chunk State: 2019-02-11 16:13:08.150474: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 000000070B400000 of size 1280 2019-02-11 16:13:08.152627: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 000000070B400500 of size 256 2019-02-11 16:13:08.154790: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 000000070B400600 of size 256 .... 2019-02-11 16:17:38.699526: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Sum Total of in-use chunks: 6.11GiB 2019-02-11 16:17:38.701621: I tensorflow/core/common_runtime/bfc_allocator.cc:647] Stats: Limit: 6624727531 InUse: 6557567488 MaxInUse: 6590199040 NumAllocs: 3719 MaxAllocSize: 1624768512 2019-02-11 16:17:38.708981: W tensorflow/core/common_runtime/bfc_allocator.cc:271] **************************************************************************************************** 2019-02-11 16:17:38.712172: W tensorflow/core/framework/op_kernel.cc:1412] OP_REQUIRES failed at conv_ops_fused.cc:734 : Resource exhausted: OOM when allocating tensor with shape[16,256,52,52] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc |
我只报告了该消息的几行,但显然这是内存使用问题。
也许我应该在生成器功能中使用CPU来读取文件中的图像和标签吗?
在这种情况下该怎么办?
谢谢。
416x416对于神经网络来说是一个很大的尺寸。
The solution in this case is to reduce the batch size.
您可能不喜欢的其他解决方案是:
- 减少模型容量(分层中减少单位/过滤器)
- 缩小图像尺寸
- 如果您使用的是float64,请尝试float32(在Keras中这可能很难,具体取决于您所使用的图层)
分配内存时,Keras / Tensorflow有一个奇怪的行为。 我不知道它是如何工作的,但是我似乎认为大型模型可以通过而小型模型可以通过。 但是,这些较小的模型具有更复杂的操作和分支。
重要的是:
如果此问题在您的第一个转化层中发生,那么在其余模型中什么也做不了,您需要减少第一层的过滤器(或图像大小)