OSError: (External) CUDNN error(8), CUDNN_STATUS_EXECUTION_FAILED. #14736
Unanswered
Jerry200404
asked this question in
Q&A
Replies: 1 comment 1 reply
-
根据你提供的错误信息和环境配置, 1. CUDA和cuDNN版本不匹配
2. GPU内存不足
3. 驱动问题
4. PaddlePaddle版本问题
5. 环境变量设置
6. 硬件问题
7. 调试信息
8. 参考相关Issue
希望这些建议能帮助你解决问题。如果问题仍然存在,建议你提供更多的调试信息,以便进一步分析。 Response generated by 🤖 feifei-bot | deepseek-chat |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
(hzs) PS D:\PP-OCR\PaddleOCR> python tools/train.py -c configs/rec/multi_language/rec_cyrillic_lite_train.yml
[2025/02/21 09:56:10] ppocr INFO: Architecture :
[2025/02/21 09:56:10] ppocr INFO: Backbone :
[2025/02/21 09:56:10] ppocr INFO: model_name : small
[2025/02/21 09:56:10] ppocr INFO: name : MobileNetV3
[2025/02/21 09:56:10] ppocr INFO: scale : 0.5
[2025/02/21 09:56:10] ppocr INFO: small_stride : [1, 2, 2, 2]
[2025/02/21 09:56:10] ppocr INFO: Head :
[2025/02/21 09:56:10] ppocr INFO: fc_decay : 1e-05
[2025/02/21 09:56:10] ppocr INFO: name : CTCHead
[2025/02/21 09:56:10] ppocr INFO: Neck :
[2025/02/21 09:56:10] ppocr INFO: encoder_type : rnn
[2025/02/21 09:56:10] ppocr INFO: hidden_size : 48
[2025/02/21 09:56:10] ppocr INFO: name : SequenceEncoder
[2025/02/21 09:56:10] ppocr INFO: Transform : None
[2025/02/21 09:56:10] ppocr INFO: algorithm : CRNN
[2025/02/21 09:56:10] ppocr INFO: model_type : rec
[2025/02/21 09:56:10] ppocr INFO: Eval :
[2025/02/21 09:56:10] ppocr INFO: dataset :
[2025/02/21 09:56:10] ppocr INFO: data_dir : dataset3/
[2025/02/21 09:56:10] ppocr INFO: label_file_list : ['dataset3/rec/val.txt']
[2025/02/21 09:56:10] ppocr INFO: name : SimpleDataSet
[2025/02/21 09:56:10] ppocr INFO: transforms :
[2025/02/21 09:56:10] ppocr INFO: DecodeImage :
[2025/02/21 09:56:10] ppocr INFO: channel_first : False
[2025/02/21 09:56:10] ppocr INFO: img_mode : BGR
[2025/02/21 09:56:10] ppocr INFO: CTCLabelEncode : None
[2025/02/21 09:56:10] ppocr INFO: RecResizeImg :
[2025/02/21 09:56:10] ppocr INFO: image_shape : [3, 32, 256]
[2025/02/21 09:56:10] ppocr INFO: KeepKeys :
[2025/02/21 09:56:10] ppocr INFO: keep_keys : ['image', 'label', 'length']
[2025/02/21 09:56:10] ppocr INFO: loader :
[2025/02/21 09:56:10] ppocr INFO: batch_size_per_card : 1
[2025/02/21 09:56:10] ppocr INFO: drop_last : False
[2025/02/21 09:56:10] ppocr INFO: num_workers : 1
[2025/02/21 09:56:10] ppocr INFO: shuffle : False
[2025/02/21 09:56:10] ppocr INFO: Global :
[2025/02/21 09:56:10] ppocr INFO: cal_metric_during_train : True
[2025/02/21 09:56:10] ppocr INFO: character_dict_path : ppocr/utils/dict/cyrillic_dict.txt
[2025/02/21 09:56:10] ppocr INFO: checkpoints : None
[2025/02/21 09:56:10] ppocr INFO: distributed : False
[2025/02/21 09:56:10] ppocr INFO: epoch_num : 500
[2025/02/21 09:56:10] ppocr INFO: eval_batch_step : [0, 200]
[2025/02/21 09:56:10] ppocr INFO: infer_img : None
[2025/02/21 09:56:10] ppocr INFO: infer_mode : False
[2025/02/21 09:56:10] ppocr INFO: log_smooth_window : 20
[2025/02/21 09:56:10] ppocr INFO: max_text_length : 25
[2025/02/21 09:56:10] ppocr INFO: pretrained_model : None
[2025/02/21 09:56:10] ppocr INFO: print_batch_step : 10
[2025/02/21 09:56:10] ppocr INFO: save_epoch_step : 3
[2025/02/21 09:56:10] ppocr INFO: save_inference_dir : None
[2025/02/21 09:56:10] ppocr INFO: save_model_dir : ./output/rec_cyrillic_lite
[2025/02/21 09:56:10] ppocr INFO: use_gpu : True
[2025/02/21 09:56:10] ppocr INFO: use_space_char : True
[2025/02/21 09:56:10] ppocr INFO: use_visualdl : False
[2025/02/21 09:56:10] ppocr INFO: Loss :
[2025/02/21 09:56:10] ppocr INFO: name : CTCLoss
[2025/02/21 09:56:10] ppocr INFO: Metric :
[2025/02/21 09:56:10] ppocr INFO: main_indicator : acc
[2025/02/21 09:56:10] ppocr INFO: name : RecMetric
[2025/02/21 09:56:10] ppocr INFO: Optimizer :
[2025/02/21 09:56:10] ppocr INFO: beta1 : 0.9
[2025/02/21 09:56:10] ppocr INFO: beta2 : 0.999
[2025/02/21 09:56:10] ppocr INFO: lr :
[2025/02/21 09:56:10] ppocr INFO: learning_rate : 0.001
[2025/02/21 09:56:10] ppocr INFO: name : Cosine
[2025/02/21 09:56:10] ppocr INFO: name : Adam
[2025/02/21 09:56:10] ppocr INFO: regularizer :
[2025/02/21 09:56:10] ppocr INFO: factor : 1e-05
[2025/02/21 09:56:10] ppocr INFO: name : L2
[2025/02/21 09:56:10] ppocr INFO: PostProcess :
[2025/02/21 09:56:10] ppocr INFO: name : CTCLabelDecode
[2025/02/21 09:56:10] ppocr INFO: Train :
[2025/02/21 09:56:10] ppocr INFO: dataset :
[2025/02/21 09:56:10] ppocr INFO: data_dir : dataset3/
[2025/02/21 09:56:10] ppocr INFO: label_file_list : ['dataset3/rec/train.txt']
[2025/02/21 09:56:10] ppocr INFO: name : SimpleDataSet
[2025/02/21 09:56:10] ppocr INFO: transforms :
[2025/02/21 09:56:10] ppocr INFO: DecodeImage :
[2025/02/21 09:56:10] ppocr INFO: channel_first : False
[2025/02/21 09:56:10] ppocr INFO: img_mode : BGR
[2025/02/21 09:56:10] ppocr INFO: RecAug : None
[2025/02/21 09:56:10] ppocr INFO: CTCLabelEncode : None
[2025/02/21 09:56:10] ppocr INFO: RecResizeImg :
[2025/02/21 09:56:10] ppocr INFO: image_shape : [3, 32, 256]
[2025/02/21 09:56:10] ppocr INFO: KeepKeys :
[2025/02/21 09:56:10] ppocr INFO: keep_keys : ['image', 'label', 'length']
[2025/02/21 09:56:10] ppocr INFO: loader :
[2025/02/21 09:56:10] ppocr INFO: batch_size_per_card : 1
[2025/02/21 09:56:10] ppocr INFO: drop_last : True
[2025/02/21 09:56:10] ppocr INFO: num_workers : 1
[2025/02/21 09:56:10] ppocr INFO: shuffle : True
[2025/02/21 09:56:10] ppocr INFO: profiler_options : None
[2025/02/21 09:56:10] ppocr INFO: train with paddle 2.6.1 and device Place(gpu:0)
[2025/02/21 09:56:10] ppocr INFO: Initialize indexs of datasets:['dataset3/rec/train.txt']
[2025/02/21 09:56:10] ppocr INFO: Initialize indexs of datasets:['dataset3/rec/val.txt']
W0221 09:56:10.687160 16592 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 12.4, Runtime API Version: 11.2
W0221 09:56:10.749336 16592 gpu_resources.cc:164] device: 0, cuDNN Version: 8.2.
INFO:root:If regularizer of a Parameter has been set by 'paddle.ParamAttr' or 'static.WeightNormParamAttr' already. The weight_decay[1e-05] in Optimizer will not take effect, and it will only be applied to other Parameters!
[2025/02/21 09:56:12] ppocr INFO: train dataloader has 8 iters
[2025/02/21 09:56:12] ppocr INFO: valid dataloader has 1 iters
[2025/02/21 09:56:12] ppocr INFO: train from scratch
[2025/02/21 09:56:12] ppocr INFO: During the training process, after the 0th iteration, an evaluation is run every 200 iterations
fatal : Memory allocation failure
Traceback (most recent call last):
File "tools/train.py", line 269, in
main(config, device, logger, vdl_writer, seed)
File "tools/train.py", line 222, in main
program.train(
File "D:\PP-OCR\PaddleOCR\tools\program.py", line 366, in train
preds = model(images)
File "C:\Users\HZS.conda\envs\hzs\lib\site-packages\paddle\nn\layer\layers.py", line 1429, in call
return self.forward(*inputs, **kwargs)
File "D:\PP-OCR\PaddleOCR\ppocr\modeling\architectures\base_model.py", line 85, in forward
x = self.backbone(x)
return self.forward(*inputs, **kwargs)
File "C:\Users\HZS.conda\envs\hzs\lib\site-packages\paddle\nn\layer\conv.py", line 715, in forward
out = F.conv._conv_nd(
File "C:\Users\HZS.conda\envs\hzs\lib\site-packages\paddle\nn\functional\conv.py", line 128, in _conv_nd
pre_bias = _C_ops.conv2d(
OSError: (External) CUDNN error(8), CUDNN_STATUS_EXECUTION_FAILED.
[Hint: 'CUDNN_STATUS_EXECUTION_FAILED'. The GPU program failed to execute. This is usually caused by a failure to launch some cuDNN kernel on the GPU, which can occur for multiple reasons. To correct, check that the hardware, an appropriate version of the driver, and the cuDNN library are correctly installed. Otherwise, this may indicate an internal error/bug in the library. ] (at ../paddle/phi/kernels/gpudnn/conv_cudnn_v7.h:804)
environment:
windows11
RTX3060
python 3.8
cuda11.2
cudnn8.2.1
环境太难配了
The environment is too difficult to match
我快崩溃了
I almost broke down
Beta Was this translation helpful? Give feedback.
All reactions