Paddle_OCR Training - RecursionError: maximum recursion depth exceeded while calling a Python object #13416
-
I was trying to train a paddle ocr rec mode. my train.txt file contains labeled image dataset using the tool PPOCR_label
I got this error below -
so how to fix this error. ? help? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 4 replies
-
the dataset you used is for det model, not for rec model. |
Beta Was this translation helpful? Give feedback.
-
Hi I'm still having this issue I'm using this config file to train my rec model (configs\rec\PP-OCRv4) Global:
debug: false
use_gpu: true
epoch_num: 500
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/rec_ppocr_v4
save_epoch_step: 10
eval_batch_step:
- 0
- 2000
cal_metric_during_train: true
pretrained_model: null
checkpoints: null
save_inference_dir: null
use_visualdl: false
infer_img: doc/imgs_en/A.png
character_dict_path: ppocr/utils/en_dict.txt
max_text_length: 1
infer_mode: false
use_space_char: false
distributed: true
save_res_path: ./output/rec/predicts_ppocrv3.txt
Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: Cosine
learning_rate: 0.0005
warmup_epoch: 5
regularizer:
name: L2
factor: 3.0e-05
Architecture:
model_type: rec
algorithm: SVTR_LCNet
Transform: null
Backbone:
name: PPLCNetV3
scale: 0.95
Head:
name: MultiHead
head_list:
- CTCHead:
Neck:
name: svtr
dims: 120
depth: 2
hidden_dims: 120
kernel_size:
- 1
- 3
use_guide: true
Head:
fc_decay: 1.0e-05
- NRTRHead:
nrtr_dim: 384
max_text_length: 25
Loss:
name: MultiLoss
loss_config_list:
- CTCLoss: null
- NRTRLoss: null
PostProcess:
name: CTCLabelDecode
Metric:
name: RecMetric
main_indicator: acc
ignore_space: false
Train:
dataset:
name: MultiScaleDataSet
# name: SimpleDataSet
ds_width: false
# data_dir: ./train_data/
data_dir: ./train_data/kaggle_rec/
ext_op_transform_idx: 1
label_file_list:
# - ./train_data/train_list.txt
- ./train_data/kaggle_rec/rec_train_symbol.txt
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- RecConAug:
prob: 0.5
ext_data_num: 2
image_shape:
- 48
- 320
- 3
max_text_length: 25
- RecAug: null
- MultiLabelEncode:
gtc_encode: NRTRLabelEncode
- KeepKeys:
keep_keys:
- image
- label_ctc
- label_gtc
- length
- valid_ratio
sampler:
name: MultiScaleSampler
scales:
- - 320
- 32
- - 320
- 48
- - 320
- 64
first_bs: 96
fix_bs: false
divided_factor:
- 8
- 16
is_training: true
loader:
shuffle: true
# batch_size_per_card: 96
# batch_size_per_card: 124
batch_size_per_card: 62
drop_last: true
num_workers: 4
Eval:
dataset:
name: SimpleDataSet
# name: SimpleDataSet
# data_dir: ./train_data
data_dir: ./train_data/kaggle_rec/
label_file_list:
# - ./train_data/val_list.txt
- ./train_data/kaggle_rec/rec_test_symbol.txt
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- MultiLabelEncode:
gtc_encode: NRTRLabelEncode
- RecResizeImg:
image_shape:
- 3
- 48
- 320
- KeepKeys:
keep_keys:
- image
- label_ctc
- label_gtc
- length
- valid_ratio
loader:
shuffle: false
drop_last: false
batch_size_per_card: 62
# batch_size_per_card: 62
num_workers: 4
profiler_options: null This is my training set txt file
Result [2025/02/21 16:43:00] ppocr WARNING: Skipping import of the encryption module.
[2025/02/21 16:43:00] ppocr INFO: Architecture :
[2025/02/21 16:43:00] ppocr INFO: Backbone :
[2025/02/21 16:43:00] ppocr INFO: name : PPLCNetV3
[2025/02/21 16:43:00] ppocr INFO: scale : 0.95
[2025/02/21 16:43:00] ppocr INFO: Head :
[2025/02/21 16:43:00] ppocr INFO: head_list :
[2025/02/21 16:43:00] ppocr INFO: CTCHead :
[2025/02/21 16:43:00] ppocr INFO: Head :
[2025/02/21 16:43:00] ppocr INFO: fc_decay : 1e-05
[2025/02/21 16:43:00] ppocr INFO: Neck :
[2025/02/21 16:43:00] ppocr INFO: depth : 2
[2025/02/21 16:43:00] ppocr INFO: dims : 120
[2025/02/21 16:43:00] ppocr INFO: hidden_dims : 120
[2025/02/21 16:43:00] ppocr INFO: kernel_size : [1, 3]
[2025/02/21 16:43:00] ppocr INFO: name : svtr
[2025/02/21 16:43:00] ppocr INFO: use_guide : True
[2025/02/21 16:43:00] ppocr INFO: NRTRHead :
[2025/02/21 16:43:00] ppocr INFO: max_text_length : 25
[2025/02/21 16:43:00] ppocr INFO: nrtr_dim : 384
[2025/02/21 16:43:00] ppocr INFO: name : MultiHead
[2025/02/21 16:43:00] ppocr INFO: Transform : None
[2025/02/21 16:43:00] ppocr INFO: algorithm : SVTR_LCNet
[2025/02/21 16:43:00] ppocr INFO: model_type : rec
[2025/02/21 16:43:00] ppocr INFO: Eval :
[2025/02/21 16:43:00] ppocr INFO: dataset :
[2025/02/21 16:43:00] ppocr INFO: data_dir : ./train_data/kaggle_rec/
[2025/02/21 16:43:00] ppocr INFO: label_file_list : ['./train_data/kaggle_rec/rec_test_symbol.txt']
[2025/02/21 16:43:00] ppocr INFO: name : SimpleDataSet
[2025/02/21 16:43:00] ppocr INFO: transforms :
[2025/02/21 16:43:00] ppocr INFO: DecodeImage :
[2025/02/21 16:43:00] ppocr INFO: channel_first : False
[2025/02/21 16:43:00] ppocr INFO: img_mode : BGR
[2025/02/21 16:43:00] ppocr INFO: MultiLabelEncode :
[2025/02/21 16:43:00] ppocr INFO: gtc_encode : NRTRLabelEncode
[2025/02/21 16:43:00] ppocr INFO: RecResizeImg :
[2025/02/21 16:43:00] ppocr INFO: image_shape : [3, 48, 320]
[2025/02/21 16:43:00] ppocr INFO: KeepKeys :
[2025/02/21 16:43:00] ppocr INFO: keep_keys : ['image', 'label_ctc', 'label_gtc', 'length', 'valid_ratio']
[2025/02/21 16:43:00] ppocr INFO: loader :
[2025/02/21 16:43:00] ppocr INFO: batch_size_per_card : 62
[2025/02/21 16:43:00] ppocr INFO: drop_last : False
[2025/02/21 16:43:00] ppocr INFO: num_workers : 4
[2025/02/21 16:43:00] ppocr INFO: shuffle : False
[2025/02/21 16:43:00] ppocr INFO: Global :
[2025/02/21 16:43:00] ppocr INFO: cal_metric_during_train : True
[2025/02/21 16:43:00] ppocr INFO: character_dict_path : ppocr/utils/en_dict.txt
[2025/02/21 16:43:00] ppocr INFO: checkpoints : ./pretrain_models/en_PP-OCRv4_rec_train/best_accuracy
[2025/02/21 16:43:00] ppocr INFO: debug : False
[2025/02/21 16:43:00] ppocr INFO: distributed : False
[2025/02/21 16:43:00] ppocr INFO: epoch_num : 500
[2025/02/21 16:43:00] ppocr INFO: eval_batch_step : [0, 2000]
[2025/02/21 16:43:00] ppocr INFO: infer_img : doc/imgs_en/A.png
[2025/02/21 16:43:00] ppocr INFO: infer_mode : False
[2025/02/21 16:43:00] ppocr INFO: log_smooth_window : 20
[2025/02/21 16:43:00] ppocr INFO: max_text_length : 1
[2025/02/21 16:43:00] ppocr INFO: pretrained_model : None
[2025/02/21 16:43:00] ppocr INFO: print_batch_step : 10
[2025/02/21 16:43:00] ppocr INFO: save_epoch_step : 10
[2025/02/21 16:43:00] ppocr INFO: save_inference_dir : None
[2025/02/21 16:43:00] ppocr INFO: save_model_dir : ./output/rec_ppocr_v4
[2025/02/21 16:43:00] ppocr INFO: save_res_path : ./output/rec/predicts_ppocrv3.txt
[2025/02/21 16:43:00] ppocr INFO: use_gpu : True
[2025/02/21 16:43:00] ppocr INFO: use_space_char : False
[2025/02/21 16:43:00] ppocr INFO: use_visualdl : False
[2025/02/21 16:43:00] ppocr INFO: Loss :
[2025/02/21 16:43:00] ppocr INFO: loss_config_list :
[2025/02/21 16:43:00] ppocr INFO: CTCLoss : None
[2025/02/21 16:43:00] ppocr INFO: NRTRLoss : None
[2025/02/21 16:43:00] ppocr INFO: name : MultiLoss
[2025/02/21 16:43:00] ppocr INFO: Metric :
[2025/02/21 16:43:00] ppocr INFO: ignore_space : False
[2025/02/21 16:43:00] ppocr INFO: main_indicator : acc
[2025/02/21 16:43:00] ppocr INFO: name : RecMetric
[2025/02/21 16:43:00] ppocr INFO: Optimizer :
[2025/02/21 16:43:00] ppocr INFO: beta1 : 0.9
[2025/02/21 16:43:00] ppocr INFO: beta2 : 0.999
[2025/02/21 16:43:00] ppocr INFO: lr :
[2025/02/21 16:43:00] ppocr INFO: learning_rate : 0.0005
[2025/02/21 16:43:00] ppocr INFO: name : Cosine
[2025/02/21 16:43:00] ppocr INFO: warmup_epoch : 5
[2025/02/21 16:43:00] ppocr INFO: name : Adam
[2025/02/21 16:43:00] ppocr INFO: regularizer :
[2025/02/21 16:43:00] ppocr INFO: factor : 3e-05
[2025/02/21 16:43:00] ppocr INFO: name : L2
[2025/02/21 16:43:00] ppocr INFO: PostProcess :
[2025/02/21 16:43:00] ppocr INFO: name : CTCLabelDecode
[2025/02/21 16:43:00] ppocr INFO: Train :
[2025/02/21 16:43:00] ppocr INFO: dataset :
[2025/02/21 16:43:00] ppocr INFO: data_dir : ./train_data/kaggle_rec/
[2025/02/21 16:43:00] ppocr INFO: ds_width : False
[2025/02/21 16:43:00] ppocr INFO: ext_op_transform_idx : 1
[2025/02/21 16:43:00] ppocr INFO: label_file_list : ['./train_data/kaggle_rec/rec_train_symbol.txt']
[2025/02/21 16:43:00] ppocr INFO: name : MultiScaleDataSet
[2025/02/21 16:43:00] ppocr INFO: transforms :
[2025/02/21 16:43:00] ppocr INFO: DecodeImage :
[2025/02/21 16:43:00] ppocr INFO: channel_first : False
[2025/02/21 16:43:00] ppocr INFO: img_mode : BGR
[2025/02/21 16:43:00] ppocr INFO: RecConAug :
[2025/02/21 16:43:00] ppocr INFO: ext_data_num : 2
[2025/02/21 16:43:00] ppocr INFO: image_shape : [48, 320, 3]
[2025/02/21 16:43:00] ppocr INFO: max_text_length : 25
[2025/02/21 16:43:00] ppocr INFO: prob : 0.5
[2025/02/21 16:43:00] ppocr INFO: RecAug : None
[2025/02/21 16:43:00] ppocr INFO: MultiLabelEncode :
[2025/02/21 16:43:00] ppocr INFO: gtc_encode : NRTRLabelEncode
[2025/02/21 16:43:00] ppocr INFO: KeepKeys :
[2025/02/21 16:43:00] ppocr INFO: keep_keys : ['image', 'label_ctc', 'label_gtc', 'length', 'valid_ratio']
[2025/02/21 16:43:00] ppocr INFO: loader :
[2025/02/21 16:43:00] ppocr INFO: batch_size_per_card : 62
[2025/02/21 16:43:00] ppocr INFO: drop_last : True
[2025/02/21 16:43:00] ppocr INFO: num_workers : 4
[2025/02/21 16:43:00] ppocr INFO: shuffle : True
[2025/02/21 16:43:00] ppocr INFO: sampler :
[2025/02/21 16:43:00] ppocr INFO: divided_factor : [8, 16]
[2025/02/21 16:43:00] ppocr INFO: first_bs : 96
[2025/02/21 16:43:00] ppocr INFO: fix_bs : False
[2025/02/21 16:43:00] ppocr INFO: is_training : True
[2025/02/21 16:43:00] ppocr INFO: name : MultiScaleSampler
[2025/02/21 16:43:00] ppocr INFO: scales : [[320, 32], [320, 48], [320, 64]]
[2025/02/21 16:43:00] ppocr INFO: profiler_options : None
[2025/02/21 16:43:00] ppocr INFO: train with paddle 2.6.1 and device Place(gpu:0)
[2025/02/21 16:43:00] ppocr INFO: Initialize indexs of datasets:['./train_data/kaggle_rec/rec_train_symbol.txt']
[2025/02/21 16:43:00] ppocr INFO: Initialize indexs of datasets:['./train_data/kaggle_rec/rec_test_symbol.txt']
W0221 16:43:00.761966 47820 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 8.9, Driver API Version: 12.6, Runtime API Version: 11.7
W0221 16:43:00.767946 47820 gpu_resources.cc:164] device: 0, cuDNN Version: 8.9.
[2025/02/21 16:43:02] ppocr INFO: train dataloader has 30 iters
[2025/02/21 16:43:02] ppocr INFO: valid dataloader has 533 iters
[2025/02/21 16:43:02] ppocr WARNING: The shape of model params head.ctc_head.fc.weight [120, 98] not matched with loaded params shape [120, 97] !
[2025/02/21 16:43:02] ppocr WARNING: The shape of model params head.ctc_head.fc.bias [98] not matched with loaded params shape [97] !
[2025/02/21 16:43:02] ppocr WARNING: The shape of model params head.gtc_head.embedding.embedding.weight [102, 384] not matched with loaded params shape [101, 384] !
[2025/02/21 16:43:02] ppocr WARNING: The shape of model params head.gtc_head.tgt_word_prj.weight [384, 102] not matched with loaded params shape [384, 101] !
[2025/02/21 16:43:02] ppocr INFO: resume from ./pretrain_models/en_PP-OCRv4_rec_train/best_accuracy
[2025/02/21 16:43:02] ppocr INFO: During the training process, after the 0th iteration, an evaluation is run every 2000 iterations
Exception in thread Thread-1 (_thread_loop):
Traceback (most recent call last):
File "C:\Python Projects\PaddleOCR\ppocr\data\simple_dataset.py", line 238, in __getitem__
outs = transform(data, self.ops[:-1])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python Projects\PaddleOCR\ppocr\data\imaug\__init__.py", line 73, in transform
data = op(data)
^^^^^^^^
File "C:\Python Projects\PaddleOCR\ppocr\data\imaug\rec_img_aug.py", line 58, in __call__
img = tia_distort(img, random.randint(3, 6))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python Projects\PaddleOCR\ppocr\data\imaug\text_image_aug\augment.py", line 63, in tia_distort
dst = trans.generate()
^^^^^^^^^^^^^^^^
File "C:\Python Projects\PaddleOCR\ppocr\data\imaug\text_image_aug\warp_mls.py", line 41, in generate
return self.gen_img()
^^^^^^^^^^^^^^
File "C:\Python Projects\PaddleOCR\ppocr\data\imaug\text_image_aug\warp_mls.py", line 162, in gen_img
nx = np.clip(nx, 0, src_w - 1)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Lian Jiet\AppData\Local\Programs\Python\Python312\Lib\site-packages\numpy\core\fromnumeric.py", line 2169, in clip
return _wrapfunc(a, 'clip', a_min, a_max, out=out, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Lian Jiet\AppData\Local\Programs\Python\Python312\Lib\site-packages\numpy\core\fromnumeric.py", line 59, in _wrapfunc
return bound(*args, **kwds)
^^^^^^^^^^^^^^^^^^^^
RecursionError: maximum recursion depth exceeded
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\Lian Jiet\AppData\Local\Programs\Python\Python312\Lib\threading.py", line 1075, in _bootstrap_inner
self.run()
File "C:\Users\Lian Jiet\AppData\Local\Programs\Python\Python312\Lib\threading.py", line 1012, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\Lian Jiet\AppData\Local\Programs\Python\Python312\Lib\site-packages\paddle\io\dataloader\dataloader_iter.py", line 235, in _thread_loop
batch = self._dataset_fetcher.fetch(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Lian Jiet\AppData\Local\Programs\Python\Python312\Lib\site-packages\paddle\io\dataloader\fetcher.py", line 77, in fetch
data.append(self.dataset[idx])
~~~~~~~~~~~~^^^^^
File "C:\Python Projects\PaddleOCR\ppocr\data\simple_dataset.py", line 252, in __getitem__
return self.__getitem__([img_width, img_height, rnd_idx, wh_ratio])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python Projects\PaddleOCR\ppocr\data\simple_dataset.py", line 252, in __getitem__
return self.__getitem__([img_width, img_height, rnd_idx, wh_ratio])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python Projects\PaddleOCR\ppocr\data\simple_dataset.py", line 252, in __getitem__
return self.__getitem__([img_width, img_height, rnd_idx, wh_ratio])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Previous line repeated 984 more times]
File "C:\Python Projects\PaddleOCR\ppocr\data\simple_dataset.py", line 245, in __getitem__
data_line, traceback.format_exc()
^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Lian Jiet\AppData\Local\Programs\Python\Python312\Lib\traceback.py", line 184, in format_exc
return "".join(format_exception(sys.exception(), limit=limit, chain=chain))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Lian Jiet\AppData\Local\Programs\Python\Python312\Lib\traceback.py", line 139, in format_exception
te = TracebackException(type(value), value, tb, limit=limit, compact=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Lian Jiet\AppData\Local\Programs\Python\Python312\Lib\traceback.py", line 733, in __init__
self.stack = StackSummary._extract_from_extended_frame_gen(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Lian Jiet\AppData\Local\Programs\Python\Python312\Lib\traceback.py", line 438, in _extract_from_extended_frame_gen
f.line
File "C:\Users\Lian Jiet\AppData\Local\Programs\Python\Python312\Lib\traceback.py", line 323, in line
self._line = linecache.getline(self.filename, self.lineno)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Lian Jiet\AppData\Local\Programs\Python\Python312\Lib\linecache.py", line 30, in getline
lines = getlines(filename, module_globals)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Lian Jiet\AppData\Local\Programs\Python\Python312\Lib\linecache.py", line 46, in getlines
return updatecache(filename, module_globals)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RecursionError: maximum recursion depth exceeded |
Beta Was this translation helpful? Give feedback.
The datasets used for both recognition and detection models are different. For recognition models u should crop the lp part and then label it. When I used a full sized image with cars and lp I got the recursion error (this dataset is used for detection training not recognition).
for the detection Label.txt, I can use the file generated by PPOCRLabel directly.
But for recognition I should crop the image to fit just the text then the Labels.txt should be: filepath tabspace text
Do detection training first. After that do recognition training. Don't waste time doing recognition training first