Error when finetuning Det V5 model: ((PreconditionNotMet) The element size of transformed_input should be <= INT_MAX(2147483647), but got 2463918080) #16083

CQHofsns · 2025-07-18T04:28:42Z

CQHofsns
Jul 18, 2025

Hi everyone and Paddle devs,

First, I want to thank you for open-sourcing this project, much appreciated.

I am following this tutorial on how to fine-tune the PP-OCRv5 detection module. I have modified the yaml file to fit with my setup (as show below):

Global:
  model_name: PP-OCRv5_server_det # To use static model for inference.
  debug: false
  use_gpu: true
  epoch_num: &epoch_num 150
  log_smooth_window: 20
  print_batch_step: 10
  save_model_dir: ./finetune_models/en_PP-OCR_v5_det/ ## Change here
  save_epoch_step: 15
  eval_batch_step:
  - 0
  - 1220 ## Change here
  cal_metric_during_train: false
  checkpoints:
  pretrained_model: ./pretrain_models/PP-OCRv5/detection_model/PP-OCRv5_server_det_pretrained.pdparams ## Change here
  save_inference_dir: null
  use_visualdl: false
  infer_img: doc/imgs_en/img_10.jpg
  save_res_path: ./checkpoints/det_db/predicts_db.txt
  distributed: true

Architecture:
  model_type: det
  algorithm: DB
  Transform: null
  Backbone:
    name: PPHGNetV2_B4
    det: True
  Neck:
    name: LKPAN
    out_channels: 256
    intracl: true
  Head:
    name: PFHeadLocal
    k: 50
    mode: "large"
    

Loss:
  name: DBLoss
  balance_loss: true
  main_loss_type: DiceLoss
  alpha: 5
  beta: 10
  ohem_ratio: 3

Optimizer:
  name: Adam
  beta1: 0.9
  beta2: 0.999
  lr:
    name: Cosine
    learning_rate: 0.0001 #(8*8c) ## Change here
    warmup_epoch: 2
  regularizer:
    name: L2
    factor: 1e-6

PostProcess:
  name: DBPostProcess
  thresh: 0.3
  box_thresh: 0.6
  max_candidates: 1000
  unclip_ratio: 1.5

Metric:
  name: DetMetric
  main_indicator: hmean

Train:
  dataset:
    name: SimpleDataSet
    data_dir: ./train_data/paddleocr_train_data ## Change here
    label_file_list:
      - ./train_data/finetune_labels/det_train_gt.txt ## Change here
    ratio_list: [1.0]
    transforms:
    - DecodeImage:
        img_mode: BGR
        channel_first: false
    - DetLabelEncode: null
    - CopyPaste: null
    - IaaAugment:
        augmenter_args:
        - type: Fliplr
          args:
            p: 0.5
        - type: Affine
          args:
            rotate:
            - -10
            - 10
        - type: Resize
          args:
            size:
            - 0.5
            - 3
    - EastRandomCropData:
        size:
        - 640
        - 640
        max_tries: 50
        keep_ratio: true
    - MakeBorderMap:
        shrink_ratio: 0.4
        thresh_min: 0.3
        thresh_max: 0.7
        total_epoch: *epoch_num
    - MakeShrinkMap:
        shrink_ratio: 0.4
        min_text_size: 8
        total_epoch: *epoch_num
    - NormalizeImage:
        scale: 1./255.
        mean:
        - 0.485
        - 0.456
        - 0.406
        std:
        - 0.229
        - 0.224
        - 0.225
        order: hwc
    - ToCHWImage: null
    - KeepKeys:
        keep_keys:
        - image
        - threshold_map
        - threshold_mask
        - shrink_map
        - shrink_mask
  loader:
    shuffle: true
    drop_last: false
    batch_size_per_card: 8
    num_workers: 8

Eval:
  dataset:
    name: SimpleDataSet
    data_dir: ./train_data/paddleocr_train_data ## Change here
    label_file_list:
      - ./train_data/finetune_labels/det_val_gt.txt ## Change here
    transforms:
    - DecodeImage:
        img_mode: BGR
        channel_first: false
    - DetLabelEncode: null
    - DetResizeForTest:
        limit_side_len: 2048 ## IMPORTANT CHANGE HERE
    - NormalizeImage:
        scale: 1./255.
        mean:
        - 0.485
        - 0.456
        - 0.406
        std:
        - 0.229
        - 0.224
        - 0.225
        order: hwc
    - ToCHWImage: null
    - KeepKeys:
        keep_keys:
        - image
        - shape
        - polys
        - ignore_tags
  loader:
    shuffle: false
    drop_last: false
    batch_size_per_card: 1
    num_workers: 4
    use_shared_memory: false ## Change here
  
profiler_options: null

# Add this block to your existing config.yml
wandb:
  project: paddleocr_detection_finetune 
  name: det_v5_finetune_run_1

Yet, when finetuning, I got this error in the first evaluation stage:

eval model::  41%|████      | 709/1722 [07:08<05:38,  2.99it/s]
eval model::  41%|████      | 710/1722 [07:08<05:03,  3.33it/s]Traceback (most recent call last):
  File "XXX/Doc2Text/OCR/PaddleOCR/tools/train.py", line 272, in <module>
    main(config, device, logger, vdl_writer, seed)
  File "XXX/Doc2Text/OCR/PaddleOCR/tools/train.py", line 225, in main
    program.train(
  File "XXX/Doc2Text/OCR/PaddleOCR/tools/program.py", line 498, in train
    cur_metric = eval(
  File "XXX/Doc2Text/OCR/PaddleOCR/tools/program.py", line 713, in eval
    preds = model(images)
  File "XXX/lib/python3.10/site-packages/paddle/nn/layer/layers.py", line 1571, in __call__
    return self.forward(*inputs, **kwargs)
  File "XXX/Doc2Text/OCR/PaddleOCR/ppocr/modeling/architectures/base_model.py", line 99, in forward
    x = self.head(x, targets=data)
  File "XXX/lib/python3.10/site-packages/paddle/nn/layer/layers.py", line 1571, in __call__
    return self.forward(*inputs, **kwargs)
  File "XXX/Doc2Text/OCR/PaddleOCR/ppocr/modeling/heads/det_db_head.py", line 151, in forward
    cbn_maps = self.cbn_layer(self.up_conv(f), shrink_maps, None)
  File "XXX/lib/python3.10/site-packages/paddle/nn/layer/layers.py", line 1571, in __call__
    return self.forward(*inputs, **kwargs)
  File "XXX/Doc2Text/OCR/PaddleOCR/ppocr/modeling/heads/det_db_head.py", line 133, in forward
    out = self.last_1(self.last_3(outf))
  File "XXX/lib/python3.10/site-packages/paddle/nn/layer/layers.py", line 1571, in __call__
    return self.forward(*inputs, **kwargs)
  File "XXX/Doc2Text/OCR/PaddleOCR/ppocr/modeling/backbones/det_mobilenet_v3.py", line 187, in forward
    x = self.conv(x)
  File "XXX/lib/python3.10/site-packages/paddle/nn/layer/layers.py", line 1571, in __call__
    return self.forward(*inputs, **kwargs)
  File "XXX/lib/python3.10/site-packages/paddle/nn/layer/conv.py", line 771, in forward
    out = F.conv._conv_nd(
  File "XXX/lib/python3.10/site-packages/paddle/nn/functional/conv.py", line 150, in _conv_nd
    pre_bias = _C_ops.conv2d(
RuntimeError: (PreconditionNotMet) The element size of transformed_input should be <= INT_MAX(2147483647), but got 2463918080
  [Hint: Expected transformed_input.numel() <= largest, but received transformed_input.numel():2463918080 > largest:2147483647.] (at /paddle/paddle/phi/kernels/gpudnn/conv_kernel.cu:498)

I tried adding the limit_side_len for the DetResizeForTest (suggestion from Gemini) to 2048 or 1024, but the same error keeps appearing: It yields the same number exceeds INT_MAX value, i.e., 2463918080, to any limit_side_len I configure.

As Gemini suggests, the limit_side_len should reduce the longest side of the image to the target size, here is 2048, so I think if that happened, then the max pixel length we have is 2048 x 2048 x 256 (OUT_CHANNELS)= 1,073,741,824, still < INT_MAX(2147483647). But as you see, regardless of 1024 or 2048, the exact same 2463918080 appears in the error.

My question:

My question is is there any way to overcome this? Some of my picture are large (over 3000 pixels). While training stage crop the image, the eval maybe use the full size for detection.
The next question is that somehow the error occurred from the det_mobilenet_v3.py backbone? Even I am using PP_OCR_v5 model, it is supposed to be the PPHGNetV2_B4 like in the yaml config file? I tried to look up the PPHGNetV2_B4 backbone in the git, but found only source code for the recognition model.

Sorry if I make any misunderstanding or mistake, I am still learning. Any help is highly appreciated.

Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Error when finetuning Det V5 model: ((PreconditionNotMet) The element size of transformed_input should be <= INT_MAX(2147483647), but got 2463918080) #16083

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Error when finetuning Det V5 model: ((PreconditionNotMet) The element size of transformed_input should be <= INT_MAX(2147483647), but got 2463918080) #16083

Uh oh!

Uh oh!

CQHofsns Jul 18, 2025

My question:

Replies: 0 comments

CQHofsns
Jul 18, 2025