Dimension mismatch issue #16178

Maaaryooo · 2025-08-01T09:49:26Z

Maaaryooo
Aug 1, 2025

paddle version: 3.1.0
config: /PaddleOCR-main/configs/rec/PP-OCRv5/PP-OCRv5_server_rec.yml
data figure:
1). The data scale is over 20,000
2). The resolution of each image is 320 pixels wide by 32 pixels high
3). Each picture has black characters on a white background, and each picture contains only one Chinese character
4). The name of each file image follows a naming convention similar to char_4E00
pretrained model: PP-OCRv5_server_rec_pretrained.pdparams

The yml file I modified based on the dataset is as follows:
‘’‘
Global:
model_name: PP-OCRv5_server_rec # To use static model for inference.
debug: false
use_gpu: true
epoch_num: 75
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/rec/ppocrv5_ch
save_epoch_step: 1
eval_batch_step: [0, 2000]
cal_metric_during_train: true
calc_epoch_interval: 1
pretrained_model: /home/aistudio/work/PaddleOCR-main/model/PP-OCRv5_server_rec_pretrained.pdparams
checkpoints:
save_inference_dir:
use_visualdl: false
infer_img: doc/imgs_words/ch/word_1.jpg
character_dict_path: /home/aistudio/work/PaddleOCR-main/proc-data/train_data/ch_dict.txt
max_text_length: 1
infer_mode: false
use_space_char: true
distributed: true
save_res_path: ./output/rec/predicts_ppocrv5.txt
d2s_train_image_shape: [3, 32, 320]

Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: Cosine
learning_rate: 0.0005
warmup_epoch: 1
regularizer:
name: L2
factor: 3.0e-05

Architecture:
model_type: rec
algorithm: SVTR_HGNet
Transform:
Backbone:
name: PPHGNetV2_B4
text_rec: True
Head:
name: CTCHead
head_list:
- CTCHead:
Neck:
name: svtr
dims: 120
depth: 2
hidden_dims: 120
kernel_size: [1, 3]
use_guide: True
Head:
fc_decay: 0.00001

Loss:
name: CTCLoss

PostProcess:
name: CTCLabelDecode

Metric:
name: RecMetric
main_indicator: acc

Train:
dataset:
name: MultiScaleDataSet
ds_width: false
data_dir: /home/aistudio/work/PaddleOCR-main/proc-data/train_data/
ext_op_transform_idx: 1
label_file_list:
- /home/aistudio/work/PaddleOCR-main/proc-data/train_data/train_list.txt
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- KeepKeys:
keep_keys:
- image
- label
sampler:
name: MultiScaleSampler
scales: [[320, 32]]
first_bs: &bs 128
fix_bs: false
divided_factor: [8, 16] # w, h
is_training: True
loader:
shuffle: true
batch_size_per_card: 256
drop_last: true
num_workers: 16
Eval:
dataset:
name: SimpleDataSet
data_dir: /home/aistudio/work/PaddleOCR-main/proc-data/train_data/
label_file_list:
- /home/aistudio/work/PaddleOCR-main/proc-data/train_data/val_list.txt
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- RecResizeImg:
image_shape: [3, 32, 320]
- KeepKeys:
keep_keys:
- image
- label
loader:
shuffle: false
drop_last: false
batch_size_per_card: 512
num_workers: 4
‘’‘

After running with this configuration file, the error details obtained are as follows:
'''
Traceback (most recent call last):
File "/home/aistudio/work/PaddleOCR-main/tools/train.py", line 272, in
main(config, device, logger, vdl_writer, seed)
File "/home/aistudio/work/PaddleOCR-main/tools/train.py", line 225, in main
program.train(
File "/home/aistudio/work/PaddleOCR-main/tools/program.py", line 356, in train
preds = model(images, data=batch[1:])
File "/opt/conda/envs/pure-paddle/lib/python3.10/site-packages/paddle/nn/layer/layers.py", line 1571, in call
return self.forward(*inputs, **kwargs)
File "/home/aistudio/work/PaddleOCR-main/ppocr/modeling/architectures/base_model.py", line 99, in forward
x = self.head(x, targets=data)
File "/opt/conda/envs/pure-paddle/lib/python3.10/site-packages/paddle/nn/layer/layers.py", line 1571, in call
return self.forward(*inputs, **kwargs)
File "/home/aistudio/work/PaddleOCR-main/ppocr/modeling/heads/rec_ctc_head.py", line 79, in forward
predicts = self.fc(x)
File "/opt/conda/envs/pure-paddle/lib/python3.10/site-packages/paddle/nn/layer/layers.py", line 1571, in call
return self.forward(*inputs, **kwargs)
File "/opt/conda/envs/pure-paddle/lib/python3.10/site-packages/paddle/nn/layer/common.py", line 223, in forward
out = F.linear(
File "/opt/conda/envs/pure-paddle/lib/python3.10/site-packages/paddle/nn/functional/common.py", line 2310, in linear
return _C_ops.linear(x, weight, bias)
ValueError: (InvalidArgument) Input(Y) has error dim. Y'dims[0] must be equal to 40, but received Y'dims[0] is 2048.
[Hint: Expected y_dims[y_ndim - 2] == K, but received y_dims[y_ndim - 2]:2048 != K:40.] (at ../paddle/phi/kernels/impl/matmul_kernel_impl.h:332)
[operator < linear > error]
'''

I've already run out of ideas for modifications. I hope the experts in the community can provide valuable suggestions for improvement

zhangyubo0722 · 2025-08-18T07:18:14Z

zhangyubo0722
Aug 18, 2025
Collaborator

Could you give me a sample of your data so I can locate it?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dimension mismatch issue #16178

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Dimension mismatch issue #16178

Uh oh!

Maaaryooo Aug 1, 2025

Replies: 1 comment

Uh oh!

zhangyubo0722 Aug 18, 2025 Collaborator

Maaaryooo
Aug 1, 2025

zhangyubo0722
Aug 18, 2025
Collaborator