SystemError: (Fatal) Blocking queue is killed because the data reader raises an exception. #12784
Unanswered
LoveSimons
asked this question in
Q&A
Replies: 3 comments
-
请问你解决了吗 |
Beta Was this translation helpful? Give feedback.
0 replies
-
I've entered the same problem. |
Beta Was this translation helpful? Give feedback.
0 replies
-
无法确定是数据问题,还是多线程导致的问题 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem
...
2023/08/18 09:14:58] ppocr INFO: epoch: [1/100], global_step: 280, lr: 0.000500, acc: 0.000000, norm_edit_dis: 0.042185, loss: 20161.078125, avg_reader_cost: 0.00043 s, avg_batch_cost: 0.84911 s, avg_samples: 128.0, ips: 150.74659 samples/s, eta: 1 day, 9:25:50
[2023/08/18 09:15:06] ppocr INFO: epoch: [1/100], global_step: 290, lr: 0.000500, acc: 0.000000, norm_edit_dis: 0.043604, loss: 20468.968750, avg_reader_cost: 0.00030 s, avg_batch_cost: 0.84909 s, avg_samples: 128.0, ips: 150.74880 samples/s, eta: 1 day, 9:24:34
[2023/08/18 09:15:15] ppocr INFO: epoch: [1/100], global_step: 300, lr: 0.000500, acc: 0.000000, norm_edit_dis: 0.043811, loss: 20012.769531, avg_reader_cost: 0.00031 s, avg_batch_cost: 0.84916 s, avg_samples: 128.0, ips: 150.73771 samples/s, eta: 1 day, 9:23:22
Exception in thread Thread-2:
Traceback (most recent call last):
File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/usr/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dataloader/dataloader_iter.py", line 536, in _thread_loop
batch = self._get_data()
File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dataloader/dataloader_iter.py", line 674, in _get_data
batch.reraise()
File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dataloader/worker.py", line 172, in reraise
raise self.exc_type(msg)
ValueError: DataLoader worker(1) caught ValueError with message:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dataloader/worker.py", line 339, in _worker_loop
batch = fetcher.fetch(indices)
File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dataloader/fetcher.py", line 138, in fetch
data = self.collate_fn(data)
File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dataloader/collate.py", line 77, in default_collate_fn
return [default_collate_fn(fields) for fields in zip(*batch)]
File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dataloader/collate.py", line 77, in
return [default_collate_fn(fields) for fields in zip(*batch)]
File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dataloader/collate.py", line 58, in default_collate_fn
batch = np.stack(batch, axis=0)
File "<array_function internals>", line 6, in stack
File "/usr/local/lib/python3.7/dist-packages/numpy/core/shape_base.py", line 426, in stack
raise ValueError('all input arrays must have the same shape')
ValueError: all input arrays must have the same shape
Traceback (most recent call last):
File "tools/train.py", line 227, in
main(config, device, logger, vdl_writer)
File "tools/train.py", line 202, in main
amp_dtype)
File "/home/project_glf/PaddleOCR-release-2.7/tools/program.py", line 269, in train
for idx, batch in enumerate(train_dataloader):
File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dataloader/dataloader_iter.py", line 745, in next
self.reader.read_next_list()[0])
SystemError: (Fatal) Blocking queue is killed because the data reader raises an exception.
[Hint: Expected killed != true, but received killed_:1 == true:1.] (at /paddle/paddle/fluid/operators/reader/blocking_queue.h:175)
I0818 09:15:22.301784 929 tcp_store.cc:257] receive shutdown event and so quit from MasterDaemon run loop
[2023-08-18 09:15:26,667] [ INFO] launch_utils.py:329 - terminate process group gid:838
INFO 2023-08-18 09:15:26,667 launch_utils.py:329] terminate process group gid:838
[2023-08-18 09:15:26,668] [ INFO] launch_utils.py:329 - terminate process group gid:844
INFO 2023-08-18 09:15:26,668 launch_utils.py:329] terminate process group gid:844
[2023-08-18 09:15:26,668] [ INFO] launch_utils.py:329 - terminate process group gid:850
INFO 2023-08-18 09:15:26,668 launch_utils.py:329] terminate process group gid:850
[2023-08-18 09:15:33,677] [ INFO] launch_utils.py:350 - terminate all the procs
INFO 2023-08-18 09:15:33,677 launch_utils.py:350] terminate all the procs
[2023-08-18 09:15:33,677] [ ERROR] launch_utils.py:659 - ABORT!!! Out of all 4 trainers, the trainer process with rank=[0] was aborted. Please check its log.
ERROR 2023-08-18 09:15:33,677 launch_utils.py:659] ABORT!!! Out of all 4 trainers, the trainer process with rank=[0] was aborted. Please check its log.
[2023-08-18 09:15:37,682] [ INFO] launch_utils.py:350 - terminate all the procs
INFO 2023-08-18 09:15:37,682 launch_utils.py:350] terminate all the procs
[2023-08-18 09:15:37,682] [ WARNING] launch.py:424 - Terminating... exit
WARNING 2023-08-18 09:15:37,682 launch.py:424] Terminating... exit
[2023-08-18 09:15:41,686] [ INFO] launch_utils.py:350 - terminate all the procs
INFO 2023-08-18 09:15:41,686 launch_utils.py:350] terminate all the procs
Beta Was this translation helpful? Give feedback.
All reactions