Trained a rec_multi_language model from scratch and keep getting index out of range on inferencing #15399

myoussefa96 · 2025-05-26T08:13:23Z

myoussefa96
May 26, 2025

Hi everyone,

I tried training a CRNN model on my font and dict from scratch. This is my config file

Global:
  use_gpu: True
  epoch_num: 2000
  log_smooth_window: 20
  print_batch_step: 10
  save_model_dir: ./output/rec_multi_language_lite2
  save_epoch_step: 3
  # evaluation is run every 5000 iterations after the 4000th iteration
  eval_batch_step: [0, 2000]
  # if pretrained_model is saved in static mode, load_static_weights must set to True
  cal_metric_during_train: True
  pretrained_model: 
  checkpoints: 
  save_inference_dir: 
  use_visualdl: False
  infer_img:
  # for data or label process
  character_dict_path: dict.txt
  # Set the language of training, if set, select the default dictionary file
  character_type: 
  max_text_length: 25
  infer_mode: False
  use_space_char: True


Optimizer:
  name: Adam
  beta1: 0.9
  beta2: 0.999
  lr:
    name: Cosine
    learning_rate: 0.001
  regularizer:
    name: 'L2'
    factor: 0.00001

Architecture:
  model_type: rec
  algorithm: CRNN
  Transform:
  Backbone:
    name: MobileNetV3
    scale: 0.5
    model_name: small
    small_stride: [1, 2, 2, 2]
  Neck:
    name: SequenceEncoder
    encoder_type: rnn
    hidden_size: 48
  Head:
    name: CTCHead
    fc_decay: 0.00001

Loss:
  name: CTCLoss

PostProcess:
  name: CTCLabelDecode

Metric:
  name: RecMetric
  main_indicator: acc

Train:
  dataset:
    name: SimpleDataSet
    data_dir: paddleocr_data_big/train/
    label_file_list: ["paddleocr_data_big/gt_train.txt"]
    transforms:
      - DecodeImage: # load image
          img_mode: BGR
          channel_first: False
      - CTCLabelEncode: # Class handling label
      - RecResizeImg:
          image_shape: [3, 32, 320]
      - KeepKeys:
          keep_keys: ['image', 'label', 'length'] # dataloader will return list in this order
  loader:
    shuffle: True
    batch_size_per_card: 128
    drop_last: True
    num_workers: 32

Eval:
  dataset:
    name: SimpleDataSet
    data_dir: paddleocr_data_big/val/
    label_file_list: ["paddleocr_data_big/gt_val.txt"]
    transforms:
      - DecodeImage: # load image

          img_mode: BGR
          channel_first: False
      - CTCLabelEncode: # Class handling label
      - RecResizeImg:
          image_shape: [3, 32, 320]
      - KeepKeys:
          keep_keys: ['image', 'label', 'length'] # dataloader will return list in this order
  loader:
    shuffle: False
    drop_last: False
    batch_size_per_card: 128
    num_workers: 32

and this is how my training logs look like in the beginning

usr/local/lib/python3.10/dist-packages/paddle/utils/cpp_extension/extension_utils.py:711: UserWarning: No ccache found. Please be aware that recompiling all source files may be required. You can download and install ccache from: https://github.com/ccache/ccache/blob/master/doc/INSTALL.md
  warnings.warn(warning_message)
[2025/05/24 14:01:05] ppocr WARNING: Skipping import of the encryption module.
[2025/05/24 14:01:05] ppocr INFO: Architecture : 
[2025/05/24 14:01:05] ppocr INFO:     Backbone : 
[2025/05/24 14:01:05] ppocr INFO:         model_name : small
[2025/05/24 14:01:05] ppocr INFO:         name : MobileNetV3
[2025/05/24 14:01:05] ppocr INFO:         scale : 0.5
[2025/05/24 14:01:05] ppocr INFO:         small_stride : [1, 2, 2, 2]
[2025/05/24 14:01:05] ppocr INFO:     Head : 
[2025/05/24 14:01:05] ppocr INFO:         fc_decay : 1e-05
[2025/05/24 14:01:05] ppocr INFO:         name : CTCHead
[2025/05/24 14:01:05] ppocr INFO:     Neck : 
[2025/05/24 14:01:05] ppocr INFO:         encoder_type : rnn
[2025/05/24 14:01:05] ppocr INFO:         hidden_size : 48
[2025/05/24 14:01:05] ppocr INFO:         name : SequenceEncoder
[2025/05/24 14:01:05] ppocr INFO:     Transform : None
[2025/05/24 14:01:05] ppocr INFO:     algorithm : CRNN
[2025/05/24 14:01:05] ppocr INFO:     model_type : rec
[2025/05/24 14:01:05] ppocr INFO: Eval : 
[2025/05/24 14:01:05] ppocr INFO:     dataset : 
[2025/05/24 14:01:05] ppocr INFO:         data_dir : paddleocr_data_big/val/
[2025/05/24 14:01:05] ppocr INFO:         label_file_list : ['paddleocr_data_big/gt_val.txt']
[2025/05/24 14:01:05] ppocr INFO:         name : SimpleDataSet
[2025/05/24 14:01:05] ppocr INFO:         transforms : 
[2025/05/24 14:01:05] ppocr INFO:             DecodeImage : 
[2025/05/24 14:01:05] ppocr INFO:                 channel_first : False
[2025/05/24 14:01:05] ppocr INFO:                 img_mode : BGR
[2025/05/24 14:01:05] ppocr INFO:             CTCLabelEncode : None
[2025/05/24 14:01:05] ppocr INFO:             RecResizeImg : 
[2025/05/24 14:01:05] ppocr INFO:                 image_shape : [3, 32, 320]
[2025/05/24 14:01:05] ppocr INFO:             KeepKeys : 
[2025/05/24 14:01:05] ppocr INFO:                 keep_keys : ['image', 'label', 'length']
[2025/05/24 14:01:05] ppocr INFO:     loader : 
[2025/05/24 14:01:05] ppocr INFO:         batch_size_per_card : 128
[2025/05/24 14:01:05] ppocr INFO:         drop_last : False
[2025/05/24 14:01:05] ppocr INFO:         num_workers : 32
[2025/05/24 14:01:05] ppocr INFO:         shuffle : False
[2025/05/24 14:01:05] ppocr INFO: Global : 
[2025/05/24 14:01:05] ppocr INFO:     cal_metric_during_train : True
[2025/05/24 14:01:05] ppocr INFO:     character_dict_path : dict.txt
[2025/05/24 14:01:05] ppocr INFO:     character_type : None
[2025/05/24 14:01:05] ppocr INFO:     checkpoints : None
[2025/05/24 14:01:05] ppocr INFO:     distributed : True
[2025/05/24 14:01:05] ppocr INFO:     epoch_num : 2000
[2025/05/24 14:01:05] ppocr INFO:     eval_batch_step : [0, 2000]
[2025/05/24 14:01:05] ppocr INFO:     infer_img : None
[2025/05/24 14:01:05] ppocr INFO:     infer_mode : False
[2025/05/24 14:01:05] ppocr INFO:     log_smooth_window : 20
[2025/05/24 14:01:05] ppocr INFO:     max_text_length : 25
[2025/05/24 14:01:05] ppocr INFO:     pretrained_model : None
[2025/05/24 14:01:05] ppocr INFO:     print_batch_step : 10
[2025/05/24 14:01:05] ppocr INFO:     save_epoch_step : 3
[2025/05/24 14:01:05] ppocr INFO:     save_inference_dir : None
[2025/05/24 14:01:05] ppocr INFO:     save_model_dir : ./output/rec_multi_language_lite2
[2025/05/24 14:01:05] ppocr INFO:     use_gpu : True
[2025/05/24 14:01:05] ppocr INFO:     use_space_char : True
[2025/05/24 14:01:05] ppocr INFO:     use_visualdl : False
[2025/05/24 14:01:05] ppocr INFO: Loss : 
[2025/05/24 14:01:05] ppocr INFO:     name : CTCLoss
[2025/05/24 14:01:05] ppocr INFO: Metric : 
[2025/05/24 14:01:05] ppocr INFO:     main_indicator : acc
[2025/05/24 14:01:05] ppocr INFO:     name : RecMetric
[2025/05/24 14:01:05] ppocr INFO: Optimizer : 
[2025/05/24 14:01:05] ppocr INFO:     beta1 : 0.9
[2025/05/24 14:01:05] ppocr INFO:     beta2 : 0.999
[2025/05/24 14:01:05] ppocr INFO:     lr : 
[2025/05/24 14:01:05] ppocr INFO:         learning_rate : 0.001
[2025/05/24 14:01:05] ppocr INFO:         name : Cosine
[2025/05/24 14:01:05] ppocr INFO:     name : Adam
[2025/05/24 14:01:05] ppocr INFO:     regularizer : 
[2025/05/24 14:01:05] ppocr INFO:         factor : 1e-05
[2025/05/24 14:01:05] ppocr INFO:         name : L2
[2025/05/24 14:01:05] ppocr INFO: PostProcess : 
[2025/05/24 14:01:05] ppocr INFO:     name : CTCLabelDecode
[2025/05/24 14:01:05] ppocr INFO: Train : 
[2025/05/24 14:01:05] ppocr INFO:     dataset : 
[2025/05/24 14:01:05] ppocr INFO:         data_dir : paddleocr_data_big/train/
[2025/05/24 14:01:05] ppocr INFO:         label_file_list : ['paddleocr_data_big/gt_train.txt']
[2025/05/24 14:01:05] ppocr INFO:         name : SimpleDataSet
[2025/05/24 14:01:05] ppocr INFO:         transforms : 
[2025/05/24 14:01:05] ppocr INFO:             DecodeImage : 
[2025/05/24 14:01:05] ppocr INFO:                 channel_first : False
[2025/05/24 14:01:05] ppocr INFO:                 img_mode : BGR
[2025/05/24 14:01:05] ppocr INFO:             CTCLabelEncode : None
[2025/05/24 14:01:05] ppocr INFO:             RecResizeImg : 
[2025/05/24 14:01:05] ppocr INFO:                 image_shape : [3, 32, 320]
[2025/05/24 14:01:05] ppocr INFO:             KeepKeys : 
[2025/05/24 14:01:05] ppocr INFO:                 keep_keys : ['image', 'label', 'length']
[2025/05/24 14:01:05] ppocr INFO:     loader : 
[2025/05/24 14:01:05] ppocr INFO:         batch_size_per_card : 128
[2025/05/24 14:01:05] ppocr INFO:         drop_last : True
[2025/05/24 14:01:05] ppocr INFO:         num_workers : 32
[2025/05/24 14:01:05] ppocr INFO:         shuffle : True
[2025/05/24 14:01:05] ppocr INFO: profiler_options : None
[2025/05/24 14:01:05] ppocr INFO: train with paddle 3.0.0-rc1 and device Place(gpu:0)
======================= Modified FLAGS detected =======================
FLAGS(name='FLAGS_cupti_dir', current_value='/usr/local/lib/python3.10/dist-packages/paddle/../nvidia/cuda_cupti/lib', default_value='')
FLAGS(name='FLAGS_curand_dir', current_value='/usr/local/lib/python3.10/dist-packages/paddle/../nvidia/curand/lib', default_value='')
FLAGS(name='FLAGS_cusparse_dir', current_value='/usr/local/lib/python3.10/dist-packages/paddle/../nvidia/cusparse/lib', default_value='')
FLAGS(name='FLAGS_nvidia_package_dir', current_value='/usr/local/lib/python3.10/dist-packages/paddle/../nvidia', default_value='')
FLAGS(name='FLAGS_selected_gpus', current_value='0', default_value='')
FLAGS(name='FLAGS_cublas_dir', current_value='/usr/local/lib/python3.10/dist-packages/paddle/../nvidia/cublas/lib', default_value='')
FLAGS(name='FLAGS_cudnn_dir', current_value='/usr/local/lib/python3.10/dist-packages/paddle/../nvidia/cudnn/lib', default_value='')
FLAGS(name='FLAGS_cusolver_dir', current_value='/usr/local/lib/python3.10/dist-packages/paddle/../nvidia/cusolver/lib', default_value='')
FLAGS(name='FLAGS_nccl_dir', current_value='/usr/local/lib/python3.10/dist-packages/paddle/../nvidia/nccl/lib', default_value='')
FLAGS(name='FLAGS_enable_pir_in_executor', current_value=True, default_value=False)
=======================================================================
I0524 14:01:05.860921 647717 tcp_utils.cc:181] The server starts to listen on IP_ANY:51186
I0524 14:01:05.861115 647717 tcp_utils.cc:130] Successfully connected to 10.6.11.1:51186
I0524 14:01:08.924443 647717 process_group_nccl.cc:151] ProcessGroupNCCL pg_timeout_ 1800000
I0524 14:01:08.924511 647717 process_group_nccl.cc:152] ProcessGroupNCCL nccl_comm_init_option_ 0
[2025/05/24 14:01:08] ppocr INFO: Initialize indexes of datasets:['paddleocr_data_big/gt_train.txt']
[2025/05/24 14:01:13] ppocr INFO: Initialize indexes of datasets:['paddleocr_data_big/gt_val.txt']
W0524 14:01:14.369681 647717 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 12.2, Runtime API Version: 11.8
W0524 14:01:14.371297 647717 gpu_resources.cc:164] device: 0, cuDNN Version: 8.9.
[2025/05/24 14:01:14] ppocr INFO: train dataloader has 7095 iters
[2025/05/24 14:01:14] ppocr INFO: valid dataloader has 7096 iters
[2025/05/24 14:01:14] ppocr INFO: train from scratch
[2025/05/24 14:01:15] ppocr INFO: During the training process, after the 0th iteration, an evaluation is run every 2000 iterations
[2025/05/24 14:01:29] ppocr INFO: epoch: [1/2000], global_step: 10, lr: 0.001000, acc: 0.000000, norm_edit_dis: 0.000000, loss: 264.188011, avg_reader_cost: 1.04500 s, avg_batch_cost: 1.34077 s, avg_samples: 128.0, ips: 95.46769 samples/s, eta: 220 days, 4:51:20, max_mem_reserved: 2760 MB, max_mem_allocated: 2695 MB
[2025/05/24 14:01:30] ppocr INFO: epoch: [1/2000], global_step: 20, lr: 0.001000, acc: 0.000000, norm_edit_dis: 0.000000, loss: 166.015335, avg_reader_cost: 0.00414 s, avg_batch_cost: 0.09793 s, avg_samples: 128.0, ips: 1306.99844 samples/s, eta: 118

so it seems correctly training from scratch. When I exported an inference model afterwards and tried it I keep getting more classes out of the model than the ones in my dict.txt which causes index out of range errors as seen here

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[4], line 29
     26 image = Image.open(image_path).convert("RGB")
     27 img_width, img_height = image.size  # Get image dimensions
---> 29 ocr_results = ocr.ocr(image_path, cls=True)
     31 words = []
     32 word_boxes = []

File ~/miniconda3/envs/MinerU/lib/python3.10/site-packages/paddleocr/paddleocr.py:668, in PaddleOCR.ocr(self, img, det, rec, cls, bin, inv, alpha_color)
    666 for idx, img in enumerate(imgs):
    667     img = preprocess_image(img)
--> 668     dt_boxes, rec_res, _ = self.__call__(img, cls)
    669     if not dt_boxes and not rec_res:
    670         ocr_res.append(None)

File ~/miniconda3/envs/MinerU/lib/python3.10/site-packages/paddleocr/tools/infer/predict_system.py:105, in TextSystem.__call__(self, img, cls)
    101     time_dict['cls'] = elapse
    102     logger.debug("cls num  : {}, elapsed : {}".format(
    103         len(img_crop_list), elapse))
--> 105 rec_res, elapse = self.text_recognizer(img_crop_list)
    106 time_dict['rec'] = elapse
    107 logger.debug("rec_res num  : {}, elapsed : {}".format(
    108     len(rec_res), elapse))

File ~/miniconda3/envs/MinerU/lib/python3.10/site-packages/paddleocr/tools/infer/predict_rec.py:628, in TextRecognizer.__call__(self, img_list)
    626             preds = outputs[0]
    627         self.predictor.try_shrink_memory()
--> 628 rec_result = self.postprocess_op(preds)
    629 for rno in range(len(rec_result)):
    630     rec_res[indices[beg_img_no + rno]] = rec_result[rno]

File ~/miniconda3/envs/MinerU/lib/python3.10/site-packages/paddleocr/ppocr/postprocess/rec_postprocess.py:124, in CTCLabelDecode.__call__(self, preds, label, *args, **kwargs)
    122 preds_idx = preds.argmax(axis=2)
    123 preds_prob = preds.max(axis=2)
--> 124 text = self.decode(preds_idx, preds_prob, is_remove_duplicate=True)
    125 if label is None:
    126     return text

File ~/miniconda3/envs/MinerU/lib/python3.10/site-packages/paddleocr/ppocr/postprocess/rec_postprocess.py:84, in BaseRecLabelDecode.decode(self, text_index, text_prob, is_remove_duplicate)
     82 print("Max text_id in this batch:", text_index[batch_idx][selection].max())
     83 print("self.character size:", len(self.character))
---> 84 char_list = [
     85     self.character[text_id]
     86     for text_id in text_index[batch_idx][selection]
     87 ]
     88 if text_prob is not None:
     89     conf_list = text_prob[batch_idx][selection]

File ~/miniconda3/envs/MinerU/lib/python3.10/site-packages/paddleocr/ppocr/postprocess/rec_postprocess.py:85, in <listcomp>(.0)
     82 print("Max text_id in this batch:", text_index[batch_idx][selection].max())
     83 print("self.character size:", len(self.character))
     84 char_list = [
---> 85     self.character[text_id]
     86     for text_id in text_index[batch_idx][selection]
     87 ]
     88 if text_prob is not None:
     89     conf_list = text_prob[batch_idx][selection]

IndexError: list index out of range

by checking my inference.yml it seems to have the correct number of classes 110

PreProcess:
  transform_ops:
  - DecodeImage:
      channel_first: false
      img_mode: BGR
  - CTCLabelEncode: null
  - RecResizeImg:
      image_shape:
      - 3
      - 32
      - 320
  - KeepKeys:
      keep_keys:
      - image
      - label
      - length
PostProcess:
  name: CTCLabelDecode
  character_dict:
  - '!'
  - '"'
  - '#'
  - $
  - '%'
  - '&'
  - ''''
  - (
  - )
  - '*'
  - +
  - ','
  - '-'
  - .
  - /
  - '0'
  - '1'
  - '2'
  - '3'
  - '4'
  - '5'
  - '6'
  - '7'
  - '8'
  - '9'
  - ':'
  - ;
  - <
  - '='
  - '>'
  - '?'
  - '@'
  - A
  - B
  - C
  - D
  - E
  - F
  - G
  - H
  - I
  - J
  - K
  - L
  - M
  - N
  - O
  - P
  - Q
  - R
  - S
  - T
  - U
  - V
  - W
  - X
  - Y
  - Z
  - '['
  - ']'
  - _
  - '`'
  - a
  - b
  - c
  - d
  - e
  - f
  - g
  - h
  - i
  - j
  - k
  - l
  - m
  - n
  - o
  - p
  - q
  - r
  - s
  - t
  - u
  - v
  - w
  - x
  - y
  - z
  - '{'
  - '}'
  - £
  - §
  - «
  - °
  - ´
  - ·
  - »
  - Ä
  - Ö
  - Ü
  - ß
  - ä
  - é
  - ö
  - ø
  - ü
  - “
  - „
  - †
  - €

while the model seems to output 187 classes as this is the logits shape : (6, 120, 187). Am I doing anything wrong ?

Thanks in advance

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Trained a rec_multi_language model from scratch and keep getting index out of range on inferencing #15399

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Trained a rec_multi_language model from scratch and keep getting index out of range on inferencing #15399

Uh oh!

myoussefa96 May 26, 2025

Replies: 0 comments

myoussefa96
May 26, 2025