PaddleOCR训练挂起 #15008
Replies: 1 comment
-
根据你提供的日志和描述,PaddleOCR 训练过程中在 2025/04/13 至 2025/04/14 上午期间出现了明显的训练速度骤降、日志输出变慢甚至“卡住”的表现(如 avg_batch_cost 高达上千秒,ips 降到 0.004 samples/s 以下)。直到你上午 8 点查看训练状态后,训练才恢复正常速度,显然这并非常规行为。 结合现象,可能存在以下原因:
🔍 建议排查与改进:
结论:你病例中训练“卡住”并非正常表现,更像是由于系统资源管理或其他外部因素干扰所致的训练阻塞或性能退化。在提升系统资源优先级以及记录耗时点的配合下,应该可以逐步定位到问题核心。 Response generated by 🤖 feifei-bot | chatgpt-4o-latest |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
我使用windows机器训练PaddleOCR检测模型,发现有一天的时间几乎都没有打印日志,请问这种情况正常吗?
从日志可以看出2025/04/13整天到2025/04/14 上午7点几乎没有打印多少日志,直到2025/04/14 上午8点我唤醒我的办公电脑去看远程训练情况,日志打印又正常了。训练的这台机器放在机房里,一直远程操作。
[2025/04/12 18:24:07] ppocr INFO: epoch: [41/300], global_step: 6598, lr: 0.000192, loss: 1.297133, loss_shrink_maps: 0.655527, loss_threshold_maps: 0.389204, loss_binary_maps: 0.131916, loss_cbn: 0.131916, avg_reader_cost: 1.42255 s, avg_batch_cost: 3.59446 s, avg_samples: 8.0, ips: 2.22565 samples/s, eta: 1 day, 7:59:55
[2025/04/12 18:24:12] ppocr INFO: epoch: [41/300], global_step: 6600, lr: 0.000192, loss: 1.290149, loss_shrink_maps: 0.646665, loss_threshold_maps: 0.387606, loss_binary_maps: 0.129618, loss_cbn: 0.129618, avg_reader_cost: 0.16804 s, avg_batch_cost: 2.27980 s, avg_samples: 8.0, ips: 3.50908 samples/s, eta: 1 day, 7:59:44
[2025/04/13 23:32:35] ppocr INFO: cur metric, precision: 0.9613192182410424, recall: 0.9460082139637384, hmean: 0.9536022618266269, fps: 0.05228053427132901
[2025/04/13 23:32:35] ppocr INFO: best metric, hmean: 0.9650264950794851, is_float16: False, precision: 0.9724369406021155, recall: 0.9577281378343183, fps: 6.142403921583733, best_epoch: 30
[2025/04/13 23:34:22] ppocr INFO: epoch: [41/300], global_step: 6602, lr: 0.000192, loss: 1.290149, loss_shrink_maps: 0.646665, loss_threshold_maps: 0.393739, loss_binary_maps: 0.129618, loss_cbn: 0.129618, avg_reader_cost: 0.00000 s, avg_batch_cost: 53.76953 s, avg_samples: 8.0, ips: 0.14878 samples/s, eta: 1 day, 8:10:42
[2025/04/13 23:35:11] ppocr INFO: epoch: [41/300], global_step: 6604, lr: 0.000192, loss: 1.265249, loss_shrink_maps: 0.619627, loss_threshold_maps: 0.391290, loss_binary_maps: 0.124331, loss_cbn: 0.124331, avg_reader_cost: 0.00000 s, avg_batch_cost: 24.35121 s, avg_samples: 8.0, ips: 0.32853 samples/s, eta: 1 day, 8:15:18
[2025/04/13 23:50:25] ppocr INFO: epoch: [41/300], global_step: 6606, lr: 0.000192, loss: 1.258414, loss_shrink_maps: 0.614870, loss_threshold_maps: 0.391290, loss_binary_maps: 0.123372, loss_cbn: 0.123372, avg_reader_cost: 432.99866 s, avg_batch_cost: 456.53840 s, avg_samples: 8.0, ips: 0.01752 samples/s, eta: 1 day, 9:53:21
[2025/04/14 01:16:56] ppocr INFO: epoch: [41/300], global_step: 6608, lr: 0.000192, loss: 1.259163, loss_shrink_maps: 0.619637, loss_threshold_maps: 0.391290, loss_binary_maps: 0.124085, loss_cbn: 0.124085, avg_reader_cost: 2570.26958 s, avg_batch_cost: 2595.74047 s, avg_samples: 8.0, ips: 0.00308 samples/s, eta: 1 day, 19:13:44
[2025/04/14 02:16:26] ppocr INFO: epoch: [41/300], global_step: 6610, lr: 0.000192, loss: 1.259163, loss_shrink_maps: 0.614880, loss_threshold_maps: 0.386755, loss_binary_maps: 0.123126, loss_cbn: 0.123126, avg_reader_cost: 1761.09791 s, avg_batch_cost: 1784.76847 s, avg_samples: 8.0, ips: 0.00448 samples/s, eta: 2 days, 1:38:30
[2025/04/14 03:01:11] ppocr INFO: epoch: [41/300], global_step: 6612, lr: 0.000192, loss: 1.210776, loss_shrink_maps: 0.599874, loss_threshold_maps: 0.378401, loss_binary_maps: 0.120853, loss_cbn: 0.120853, avg_reader_cost: 1310.99789 s, avg_batch_cost: 1342.41053 s, avg_samples: 8.0, ips: 0.00596 samples/s, eta: 2 days, 6:27:26
[2025/04/14 03:44:23] ppocr INFO: epoch: [41/300], global_step: 6614, lr: 0.000192, loss: 1.235519, loss_shrink_maps: 0.609533, loss_threshold_maps: 0.384384, loss_binary_maps: 0.122396, loss_cbn: 0.122396, avg_reader_cost: 1270.08932 s, avg_batch_cost: 1296.10395 s, avg_samples: 8.0, ips: 0.00617 samples/s, eta: 2 days, 11:06:10
[2025/04/14 04:15:38] ppocr INFO: epoch: [41/300], global_step: 6616, lr: 0.000192, loss: 1.235519, loss_shrink_maps: 0.609533, loss_threshold_maps: 0.384384, loss_binary_maps: 0.122396, loss_cbn: 0.122396, avg_reader_cost: 902.51412 s, avg_batch_cost: 937.53410 s, avg_samples: 8.0, ips: 0.00853 samples/s, eta: 2 days, 14:27:18
[2025/04/14 04:32:38] ppocr INFO: epoch: [41/300], global_step: 6618, lr: 0.000192, loss: 1.235519, loss_shrink_maps: 0.609533, loss_threshold_maps: 0.384384, loss_binary_maps: 0.122396, loss_cbn: 0.122396, avg_reader_cost: 477.94830 s, avg_batch_cost: 509.92180 s, avg_samples: 8.0, ips: 0.01569 samples/s, eta: 2 days, 16:16:02
[2025/04/14 05:05:48] ppocr INFO: epoch: [41/300], global_step: 6620, lr: 0.000192, loss: 1.241576, loss_shrink_maps: 0.611342, loss_threshold_maps: 0.384384, loss_binary_maps: 0.122396, loss_cbn: 0.122396, avg_reader_cost: 949.44704 s, avg_batch_cost: 995.07541 s, avg_samples: 8.0, ips: 0.00804 samples/s, eta: 2 days, 19:49:20
[2025/04/14 05:35:59] ppocr INFO: epoch: [41/300], global_step: 6622, lr: 0.000192, loss: 1.239592, loss_shrink_maps: 0.603897, loss_threshold_maps: 0.384384, loss_binary_maps: 0.121495, loss_cbn: 0.121494, avg_reader_cost: 880.55994 s, avg_batch_cost: 905.40233 s, avg_samples: 8.0, ips: 0.00884 samples/s, eta: 2 days, 23:03:09
[2025/04/14 06:08:14] ppocr INFO: epoch: [41/300], global_step: 6624, lr: 0.000192, loss: 1.239592, loss_shrink_maps: 0.603897, loss_threshold_maps: 0.384384, loss_binary_maps: 0.121495, loss_cbn: 0.121494, avg_reader_cost: 934.03237 s, avg_batch_cost: 967.19049 s, avg_samples: 8.0, ips: 0.00827 samples/s, eta: 3 days, 2:30:09
[2025/04/14 07:09:24] ppocr INFO: epoch: [41/300], global_step: 6626, lr: 0.000192, loss: 1.253609, loss_shrink_maps: 0.611342, loss_threshold_maps: 0.388694, loss_binary_maps: 0.122396, loss_cbn: 0.122396, avg_reader_cost: 1795.02338 s, avg_batch_cost: 1835.19682 s, avg_samples: 8.0, ips: 0.00436 samples/s, eta: 3 days, 9:04:03
[2025/04/14 08:10:31] ppocr INFO: epoch: [41/300], global_step: 6628, lr: 0.000192, loss: 1.245238, loss_shrink_maps: 0.612493, loss_threshold_maps: 0.384384, loss_binary_maps: 0.123145, loss_cbn: 0.123145, avg_reader_cost: 1803.44669 s, avg_batch_cost: 1833.44098 s, avg_samples: 8.0, ips: 0.00436 samples/s, eta: 3 days, 15:37:18
[2025/04/14 08:23:08] ppocr INFO: epoch: [41/300], global_step: 6630, lr: 0.000192, loss: 1.275304, loss_shrink_maps: 0.630315, loss_threshold_maps: 0.396148, loss_binary_maps: 0.126272, loss_cbn: 0.126272, avg_reader_cost: 365.43932 s, avg_batch_cost: 378.11336 s, avg_samples: 8.0, ips: 0.02116 samples/s, eta: 3 days, 16:56:53
[2025/04/14 08:23:13] ppocr INFO: epoch: [41/300], global_step: 6632, lr: 0.000192, loss: 1.307754, loss_shrink_maps: 0.646017, loss_threshold_maps: 0.402663, loss_binary_maps: 0.129537, loss_cbn: 0.129537, avg_reader_cost: 0.84605 s, avg_batch_cost: 2.55943 s, avg_samples: 8.0, ips: 3.12570 samples/s, eta: 3 days, 16:55:35
[2025/04/14 08:23:18] ppocr INFO: epoch: [41/300], global_step: 6634, lr: 0.000192, loss: 1.304048, loss_shrink_maps: 0.646017, loss_threshold_maps: 0.397016, loss_binary_maps: 0.129537, loss_cbn: 0.129537, avg_reader_cost: 1.06639 s, avg_batch_cost: 2.81578 s, avg_samples: 8.0, ips: 2.84113 samples/s, eta: 3 days, 16:54:19
[2025/04/14 08:23:24] ppocr INFO: epoch: [41/300], global_step: 6636, lr: 0.000192, loss: 1.316786, loss_shrink_maps: 0.648818, loss_threshold_maps: 0.402663, loss_binary_maps: 0.130798, loss_cbn: 0.130798, avg_reader_cost: 0.62724 s, avg_batch_cost: 2.73381 s, avg_samples: 8.0, ips: 2.92632 samples/s, eta: 3 days, 16:53:03
[2025/04/14 08:23:33] ppocr INFO: epoch: [41/300], global_step: 6638, lr: 0.000192, loss: 1.316786, loss_shrink_maps: 0.648818, loss_threshold_maps: 0.402663, loss_binary_maps: 0.130798, loss_cbn: 0.130798, avg_reader_cost: 2.65714 s, avg_batch_cost: 4.44780 s, avg_samples: 8.0, ips: 1.79864 samples/s, eta: 3 days, 16:52:09
[2025/04/14 08:23:42] ppocr INFO: epoch: [41/300], global_step: 6640, lr: 0.000192, loss: 1.317422, loss_shrink_maps: 0.650436, loss_threshold_maps: 0.407841, loss_binary_maps: 0.130958, loss_cbn: 0.130958, avg_reader_cost: 2.92228 s, avg_batch_cost: 4.73402 s, avg_samples: 8.0, ips: 1.68989 samples/s, eta: 3 days, 16:51:19
Beta Was this translation helpful? Give feedback.
All reactions