Replies: 10 comments
-
建议尝试Paddle 2.6版本 |
Beta Was this translation helpful? Give feedback.
-
换成Paddle 2.6版本会产生另一个报错 |
Beta Was this translation helpful? Give feedback.
-
训练命令是什么?我们排查一下问题。 |
Beta Was this translation helpful? Give feedback.
-
另外,如果报错:No module named 'paddle.fluid' 问题的话,确认一下paddleocr分支,是否为release/2.7分支或dygraph分支。 |
Beta Was this translation helpful? Give feedback.
-
感谢回复,训练命令是 |
Beta Was this translation helpful? Give feedback.
-
呃,目前paddleocr没有兼容2.7的paddlenlp。想问下,使用paddlepaddle-gpu 2.5.2.post117、paddlenlp 2.5.2时的显存oom报错,我们这边测试发现是偶现的,可以尝试多试几次。 |
Beta Was this translation helpful? Give feedback.
-
我这边尝试必现,会不会是我的cuda版本是11.8导致的,paddlepaddle-gpu没有cuda11.8对应版本 |
Beta Was this translation helpful? Give feedback.
-
确实有可能,建议使用paddle官方提供的docker镜像,在docker中训练,可以避免大部分环境问题。 |
Beta Was this translation helpful? Give feedback.
-
使用docker进行训练仍有问题,拉取的镜像是paddlepaddle/paddle:2.6.1-gpu-cuda11.7-cudnn8.4-trt8.4 |
Beta Was this translation helpful? Give feedback.
-
请问您那边测试用的环境是什么配置?我这边使用docker还是会复现oom的报错 |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem
在训练kie模型中的ser训练时,报错显存不足,batch_size和num_workers均已尝试调整,仍显存不足
Ubuntu 18.04.6
paddlepaddle-gpu 2.5.2.post117
paddlenlp 2.5.2
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/kie/vi_layoutxlm/ser_vi_layoutxlm_xfund_zh.yml
Out of memory error on GPU 0. Cannot allocate 937.335938PB memory on GPU 0, 3.078247GB memory has been allocated and available memory is only 7.823547GB.
Please check whether there is any other process using GPU 0.
(at ../paddle/fluid/memory/allocation/cuda_allocator.cc:86)
请尽量不要包含图片在问题中/Please try to not include the image in the issue.
Beta Was this translation helpful? Give feedback.
All reactions