训练PP-OCRv5_server_rec时,是否为流式加载数据?如果不是,如何流式加载数据? #16213
-
训练数据量比较大,有百万量级。模型训练会卡在加载数据上。请问如何开启流式加载数据 |
Beta Was this translation helpful? Give feedback.
Answered by
liuhongen1234567
Aug 13, 2025
Replies: 1 comment 6 replies
-
您好,PP-OCRv5_server_rec 训练时会读取txt文件进行初始化,然后使用dataloader 迭代处理,数据读取应该是很快的。我们内部训练处理百万级甚至千万级的数据也没有出现数据加载卡顿,可以再深入分析一下各个步骤的耗时。 |
Beta Was this translation helpful? Give feedback.
6 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
或者使用 CUDA_VISIBLE_DEVICES=XX 命令,避开那张利用率为0的卡呢?