微调ch_PP-OCRv4_det_server_train,训练时评估模型显示out of memory #16428
Replies: 17 comments
-
显存不够不够,调小batchsize |
Beta Was this translation helpful? Give feedback.
-
train的batch size是8,跑的时候没问题。eval的batch_size是1,但跑不起来。训练中途每1000个step评估一次嘛,然后它就爆”显存不足“。前面1000个step训练都是正常的 |
Beta Was this translation helpful? Give feedback.
-
那有试过更改每次评估的step间隔吗?改小 |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
观察一下到底是内存爆了还是显存爆了吧,把batchsize改成4 看看,虽然我也不知道有没有用,没碰到过这种问题 |
Beta Was this translation helpful? Give feedback.
-
是显存爆了,调了train的batchsize也不行,我训完之后用tools/infer_det.py检测图片也是说显存爆了,就很搞不懂。。。 |
Beta Was this translation helpful? Give feedback.
-
paddle有时候有些奇奇怪怪的bug,要不重新装一下训练环境 看看(doge |
Beta Was this translation helpful? Give feedback.
-
要安装哪个版本paddlepaddle? 我都是设置为1,还是爆显存 |
Beta Was this translation helpful? Give feedback.
-
遇到同样的问题,不知如何解决 |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
遇到同样的问题,不知如何解决,cpu下识别图片,内存不断增长,最后崩了 ,给一个清理内存的方法吧 |
Beta Was this translation helpful? Give feedback.
-
遇到了同样的问题,显卡时1050ti的,有点老,但是近4G的内存不至于跑不起来吧,线程数和batch_size设置的都很小 Please check whether there is any other process using GPU 0.
|
Beta Was this translation helpful? Give feedback.
-
你是训练什么语种?什么训练样本?
…--------------------------------------------------------------------------------
------------------ 原始邮件 ------------------
发件人: DarrenZhangug ***@***.***>
发送时间: 2025-02-06 19:57:12
收件人:PaddlePaddle/PaddleOCR ***@***.***>
抄送:nissanjp ***@***.***>,Comment ***@***.***>
主题: Re: [PaddlePaddle/PaddleOCR] 微调ch_PP-OCRv4_det_server_train,训练时评估模型显示out of memory (Issue #13759)
遇到了同样的问题,显卡时1050ti的,有点老,但是近4G的内存不至于跑不起来吧,线程数和batch_size设置的都很小
`
Out of memory error on GPU 0. Cannot allocate 160.000000MB memory on GPU 0, 3.861304GB memory has been allocated and available memory is only 142.025001MB.
Please check whether there is any other process using GPU 0.
If yes, please stop them, or start PaddlePaddle on another GPU. If no, please decrease the batch size of your model.
(at ..\paddle\fluid\memory\allocation\cuda_allocator.cc:86)
`
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
我也遇到了一个和这几乎一模一样的问题,目前也没有解决 #14633。也是这个模型,也是在eval过程中OOM |
Beta Was this translation helpful? Give feedback.
-
请问有解决吗? 我也遇到同样的问题,用的4090,batchsize改成2了,在评估阶段显存还是会爆 |
Beta Was this translation helpful? Give feedback.
-
没有,我把它转成推理模型之后再测了。。。用tools/infer/predict_system.py检测+识别一起测,看效果 |
Beta Was this translation helpful? Give feedback.
-
This issue is stale because it has been open for 90 days with no activity. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
🔎 Search before asking
🐛 Bug (问题描述)
🏃♂️ Environment (运行环境)
PaddlePaddle-gpu:2.6 PaddleOCR:2.8 RAM:16G
🌰 Minimal Reproducible Example (最小可复现问题的Demo)
Beta Was this translation helpful? Give feedback.
All reactions