Want to finetune ppocr_v5 with handwritten dataset #16128

Delipan-regami · 2025-07-24T08:13:02Z

Delipan-regami
Jul 24, 2025

how can i do that i have tried with their official config_file but im facing

Traceback (most recent call last):
File "/content/PaddleOCR/tools/train.py", line 272, in
main(config, device, logger, vdl_writer, seed)
File "/content/PaddleOCR/tools/train.py", line 225, in main
program.train(
File "/content/PaddleOCR/tools/program.py", line 356, in train
preds = model(images, data=batch[1:])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/paddle/nn/layer/layers.py", line 1429, in call
return self.forward(*inputs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/content/PaddleOCR/ppocr/modeling/architectures/base_model.py", line 99, in forward
x = self.head(x, targets=data)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/paddle/nn/layer/layers.py", line 1429, in call
return self.forward(*inputs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/content/PaddleOCR/ppocr/modeling/heads/rec_multi_head.py", line 151, in forward
gtc_out = self.gtc_head(self.before_gtc(x), targets[1:])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/paddle/nn/layer/layers.py", line 1429, in call
return self.forward(*inputs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/content/PaddleOCR/ppocr/modeling/heads/rec_nrtr_head.py", line 155, in forward
return self.forward_train(src, tgt)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/content/PaddleOCR/ppocr/modeling/heads/rec_nrtr_head.py", line 135, in forward_train
tgt = decoder_layer(tgt, memory, self_mask=tgt_mask)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/paddle/nn/layer/layers.py", line 1429, in call
return self.forward(*inputs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/content/PaddleOCR/ppocr/modeling/heads/rec_nrtr_head.py", line 480, in forward
tgt1 = self.self_attn(tgt, attn_mask=self_mask)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/paddle/nn/layer/layers.py", line 1429, in call
return self.forward(*inputs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/content/PaddleOCR/ppocr/modeling/heads/rec_nrtr_head.py", line 404, in forward
self.qkv(query)
File "/usr/local/lib/python3.11/dist-packages/paddle/nn/layer/layers.py", line 1429, in call
return self.forward(*inputs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/paddle/nn/layer/common.py", line 185, in forward
out = F.linear(
^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/paddle/nn/functional/common.py", line 1960, in linear
return _C_ops.linear(x, weight, bias)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: (External) CUBLAS error(13).
[Hint: 'CUBLAS_STATUS_EXECUTION_FAILED'. The GPU program failed to execute. This is often caused by a launch failure of the kernel on the GPU, which can be caused by multiple reasons. To correct: check that the hardware, an appropriate version of the driver, and the cuBLAS library are correctly installed. ] (at /paddle/paddle/phi/kernels/funcs/blas/blas_impl.cu.h:41)
[operator < linear > error]

and i have used
Global:
model_name: PP-OCRv5_server_rec
debug: false
use_gpu: true
epoch_num: 5
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/PP-OCRv5_server_rec
save_epoch_step: 1
eval_batch_step: [0, 2000] # Consider reducing for quicker debugging feedback
cal_metric_during_train: true
calc_epoch_interval: 1

CRITICAL CHECK 1: Ensure this path is correct and the file exists

pretrained_model: /content/PaddleOCR/pretrain_models/ppocrv5_rec/PP-OCRv5_server_rec_pretrained.pdparams
save_inference_dir: /content/drive/MyDrive/recognition
use_visualdl: false
infer_img: doc/imgs_words/ch/word_1.jpg

CRITICAL CHECK 2: Ensure this path is correct and the file exists

character_dict_path: ./ppocr/utils/dict/ppocrv5_dict.txt
max_text_length: &max_text_length 25 # CRITICAL CHECK 3: Ensure no label in your data exceeds this length
infer_mode: false
use_space_char: true # Adds space to the dictionary. Make sure your dict file accounts for this or doesn't have an extra space.
distributed: false # Set to false for single GPU training. Your log showed distributed=False initially.
save_res_path: ./output/rec/predicts_ppocrv5.txt
d2s_train_image_shape: [3, 48, 320]

Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: Cosine
learning_rate: 0.0005 # Might need adjustment based on dataset size/validation performance
warmup_epoch: 1
regularizer:
name: L2
factor: 3.0e-05

Architecture:
model_type: rec
algorithm: SVTR_HGNet
Transform:
Backbone:
name: PPHGNetV2_B4
text_rec: True
Head:
name: MultiHead
head_list:
- CTCHead:
Neck:
name: svtr
dims: 120
depth: 2
hidden_dims: 120
kernel_size: [1, 3]
use_guide: True
Head:
fc_decay: 0.00001
- NRTRHead:
nrtr_dim: 384
max_text_length: *max_text_length

Loss:
name: MultiLoss
loss_config_list:
- CTCLoss:
- NRTRLoss:

PostProcess:
name: CTCLabelDecode # Used for final prediction/inference, usually CTC for this architecture

Metric:
name: RecMetric
main_indicator: acc

Train:
dataset:
name: MultiScaleDataSet
ds_width: false
data_dir: /content/data/processed_data/images # CRITICAL CHECK 4: Ensure this path is correct
ext_op_transform_idx: 1
# CRITICAL CHECK 5: Ensure this path is correct and the file exists and is formatted correctly (path\tlabel)
label_file_list:
- /content/train_list_fixed.txt
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- RecAug: # Data augmentation - might be okay, but disable temporarily if debugging data issues
- MultiLabelEncode:
gtc_encode: NRTRLabelEncode # This uses character_dict_path from Global
- KeepKeys:
keep_keys:
- image
- label_ctc
- label_gtc
- length
- valid_ratio
sampler:
name: MultiScaleSampler
scales: [[320, 32], [320, 48], [320, 64]]
first_bs: &bs 128 # Reduce if you encounter OOM (Out-Of-Memory) errors
fix_bs: false
divided_factor: [8, 16]
is_training: True
loader:
shuffle: true
batch_size_per_card: *bs
drop_last: true
num_workers: 4 # Reduce if you encounter data loading issues

Eval:
dataset:
name: SimpleDataSet
data_dir: /content/data/processed_data/images # CRITICAL CHECK 4: Ensure this path is correct
# CRITICAL CHECK 5: Often, a separate validation list is used here
label_file_list:
- /content/train_list_fixed.txt # <-- POTENTIAL ISSUE: Using training list for eval? Consider a separate val list.
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- MultiLabelEncode:
gtc_encode: NRTRLabelEncode # This also uses character_dict_path from Global
- RecResizeImg:
image_shape: [3, 48, 320]
- KeepKeys:
keep_keys:
- image
- label_ctc
- label_gtc
- length
- valid_ratio
loader:
shuffle: false
drop_last: false
batch_size_per_card: 128 # Reduce if needed
num_workers: 2 # Reduce if needed

this is my config_file

i have used this pretrained model
!pip install paddlepaddle-gpu==2.6.1 -f https://www.paddlepaddle.org.cn/whl/mkl/avx/stable.html
!wget https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv5_server_rec_pretrained.pdparams

liuhongen1234567 · 2025-08-09T07:00:21Z

liuhongen1234567
Aug 9, 2025
Collaborator

Hello, it seems that the issue is caused by an incompatibility between the Paddle version and your GPU environment. You can try upgrading Paddle to version 3.1 or above. The official Paddle website is as follows: Paddle Installation Guide.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Want to finetune ppocr_v5 with handwritten dataset #16128

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Want to finetune ppocr_v5 with handwritten dataset #16128

Uh oh!

Delipan-regami Jul 24, 2025

CRITICAL CHECK 1: Ensure this path is correct and the file exists

CRITICAL CHECK 2: Ensure this path is correct and the file exists

Replies: 1 comment

Uh oh!

Uh oh!

liuhongen1234567 Aug 9, 2025 Collaborator

Delipan-regami
Jul 24, 2025

liuhongen1234567
Aug 9, 2025
Collaborator