PaddleOCR训练量化蒸馏模型报错 Broadcast dimension mismatch. Operands could not be broadcast together with the shape of X = [64, 40, 12] and the shape of Y = [64, 80, 12]. Received [40] in X is not equal to [80] in Y at i:1 #14981

leesimeng · 2025-04-07T05:50:17Z

leesimeng
Apr 7, 2025

报错如下：
[2025/04/05 14:19:51] ppocr INFO: train from scratch
[2025/04/05 14:19:51] ppocr INFO: train from scratch
[2025/04/05 14:19:51] ppocr INFO: train dataloader has 43 iters, valid dataloader has 8 iters
[2025/04/05 14:19:51] ppocr INFO: During the training process, after the 0th iteration, an evaluation is run every 200 iterations
Traceback (most recent call last):
File "/mnt/workspace/PaddleOCR/deploy/slim/quantization/quant.py", line 226, in
main(config, device, logger, vdl_writer)
File "/mnt/workspace/PaddleOCR/deploy/slim/quantization/quant.py", line 207, in main
program.train(
File "/mnt/workspace/PaddleOCR/tools/program.py", line 367, in train
loss = loss_class(preds, batch)
File "/usr/local/lib/python3.10/site-packages/paddle/nn/layer/layers.py", line 1429, in call
return self.forward(*inputs, **kwargs)
File "/mnt/workspace/PaddleOCR/ppocr/losses/combined_loss.py", line 70, in forward
loss = loss_func(input, batch, **kargs)
File "/usr/local/lib/python3.10/site-packages/paddle/nn/layer/layers.py", line 1429, in call
return self.forward(*inputs, **kwargs)
File "/mnt/workspace/PaddleOCR/ppocr/losses/distillation_loss.py", line 111, in forward
loss = super().forward(out1[self.dis_head], out2[self.dis_head])
File "/mnt/workspace/PaddleOCR/ppocr/losses/basic_loss.py", line 129, in forward
loss = (self._kldiv(log_out1, out2) + self._kldiv(log_out2, out1)) / 2.0
File "/mnt/workspace/PaddleOCR/ppocr/losses/basic_loss.py", line 116, in _kldiv
loss = target * (paddle.log(target + eps) - x)
ValueError: (InvalidArgument) Broadcast dimension mismatch. Operands could not be broadcast together with the shape of X = [64, 40, 12] and the shape of Y = [64, 80, 12]. Received [40] in X is not equal to [80] in Y at i:1.
[Hint: Expected x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1 == true, but received x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1:0 != true:1.] (at /paddle/paddle/phi/kernels/funcs/common_shape.h:73)

运行环境：
Python 3.10.14
PaddleOCR 2.10.0

训练脚本：
python deploy/slim/quantization/quant.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation_v2.yml

训练配置文件：
Global:
debug: false
use_gpu: true
epoch_num: 400
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/rec_ppocr_v3_distillation
save_epoch_step: 100
eval_batch_step: [0, 200]
cal_metric_during_train: true
pretrained_model:
checkpoints:
save_inference_dir:
use_visualdl: false
infer_img: doc/imgs_words/ch/word_1.jpg
character_dict_path: ppocr/utils/bank_dict.txt
max_text_length: &max_text_length 25
infer_mode: false
use_space_char: true
distributed: true
save_res_path: ./output/rec/predicts_ppocrv3_distillation.txt
d2s_train_image_shape: [3, 48, -1]

Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: Piecewise
decay_epochs : [700]
values : [0.0005, 0.00005]
warmup_epoch: 5
regularizer:
name: L2
factor: 3.0e-05

Architecture:
model_type: &model_type "rec"
name: DistillationModel
algorithm: Distillation
Models:
Teacher:
pretrained: ./lsm_data/pretrained_module/ch_PP-OCRv3_rec_slim_train/best_accuracy
freeze_params: false
return_all_feats: true
model_type: *model_type
algorithm: SVTR_LCNet
Transform:
Backbone:
name: MobileNetV1Enhance
scale: 0.5
last_conv_stride: [1, 2]
last_pool_type: avg
last_pool_kernel_size: [2, 2]
Head:
name: MultiHead
head_list:

CTCHead:
Neck:
name: svtr
dims: 64
depth: 2
hidden_dims: 120
use_guide: True
Head:
fc_decay: 0.00001
SARHead:
enc_dim: 512
max_text_length: *max_text_length
Student:
pretrained:
freeze_params: false
return_all_feats: true
model_type: *model_type
algorithm: SVTR_LCNet
Transform:
Backbone:
name: MobileNetV3
scale: 0.5
last_conv_stride: [1, 2]
last_pool_type: avg
last_pool_kernel_size: [2, 2]
Head:
name: MultiHead
head_list:
CTCHead:
Neck:
name: svtr
dims: 64
depth: 2
hidden_dims: 120
use_guide: True
Head:
fc_decay: 0.00001
SARHead:
enc_dim: 512
max_text_length: *max_text_length
Loss:
name: CombinedLoss
loss_config_list:

DistillationDMLLoss:
weight: 1.0
act: "softmax"
use_log: true
model_name_pairs:
["Student", "Teacher"]
key: head_out
multi_head: True
dis_head: ctc
name: dml_ctc
DistillationDMLLoss:
weight: 0.5
act: "softmax"
use_log: true
model_name_pairs:
["Student", "Teacher"]
key: head_out
multi_head: True
dis_head: sar
name: dml_sar
DistillationDistanceLoss:
weight: 1.0
mode: "l2"
model_name_pairs:
["Student", "Teacher"]
key: backbone_out
DistillationCTCLoss:
weight: 1.0
model_name_list: ["Student", "Teacher"]
key: head_out
multi_head: True
DistillationSARLoss:
weight: 1.0
model_name_list: ["Student", "Teacher"]
key: head_out
multi_head: True
PostProcess:
name: DistillationCTCLabelDecode
model_name: ["Student", "Teacher"]
key: head_out
multi_head: True

Metric:
name: DistillationMetric
base_metric_name: RecMetric
main_indicator: acc
key: "Student"
ignore_space: False

Train:
dataset:
name: SimpleDataSet
data_dir: ./lsm_data/bank_rec1/train_image
ext_op_transform_idx: 1
label_file_list:

./lsm_data/bank_rec1/rec_gt_train.txt
transforms:
DecodeImage:
img_mode: BGR
channel_first: false
RecConAug:
prob: 0.5
ext_data_num: 2
image_shape: [48, 320, 3]
max_text_length: *max_text_length
RecAug:
MultiLabelEncode:
RecResizeImg:
image_shape: [3, 48, 320]
KeepKeys:
keep_keys:
image
label_ctc
label_sar
length
valid_ratio
loader:
shuffle: true
batch_size_per_card: 64
drop_last: true
num_workers: 4
Eval:
dataset:
name: SimpleDataSet
data_dir: ./lsm_data/bank_rec1/test_image
label_file_list:
./lsm_data/bank_rec1/rec_gt_test.txt
transforms:
DecodeImage:
img_mode: BGR
channel_first: false
MultiLabelEncode:
RecResizeImg:
image_shape: [3, 48, 320]
KeepKeys:
keep_keys:
image
label_ctc
label_sar
length
valid_ratio
loader:
shuffle: false
drop_last: false
batch_size_per_card: 64
num_workers: 4

GreatV · 2025-04-07T05:56:54Z

GreatV
Apr 7, 2025
Maintainer

你遇到的报错信息是：

Broadcast dimension mismatch. Operands could not be broadcast together with the shape of X = [64, 40, 12] and the shape of Y = [64, 80, 12]. Received [40] in X is not equal to [80] in Y at i:1

这说明在蒸馏损失计算中（如DistillationDMLLoss），教师与学生模型的某一部分输出形状不一致，具体在维度1上，学生模型为40，而教师模型为80，导致计算如KL散度时报错。

该问题通常出现在使用MultiHead（多头）结构的识别模型中，由于配置的教师和学生模型结构不同，导致输出shape不一致。

🔍 错误定位：

根据你的训练配置文件，学生模型使用的是：

Backbone: MobileNetV3,
而教师模型使用的是：
Backbone: MobileNetV1Enhance.

虽然它们都使用相同的Neck（svtr）和Head（MultiHead）结构，但由于Backbone不同，它们产生的feature map的维度可能不同，尤其在CTCHead部分。

结合报错信息：
X = [64, 40, 12] 和 Y = [64, 80, 12]
这里的第二个维度 40 和 80 可能是时间步或序列长度的维度。而在CTC任务中，这通常由输入图像的宽度和特征图下采样倍数一起决定。

可能原因：

输入图像大小为 [3, 48, 320]；
模型中Backbone的下采样倍数和Neck的操作导致Teacher模型输出时间步（如80）远大于Student模型（40）；
在DistillationDMLLoss中要求teacher和student输出一致（例如 head_out[‘ctc’]），但shape不匹配；

📌 解决方案：

解决思路是确保Student和Teacher模型在head输出（至少是定义为distillation loss中key=‘head_out’和dis_head=‘ctc’的输出）对应的维度一致。可尝试以下途径：

校验 backbone 配置一致性

建议你将 Student 的 backbone 从 MobileNetV3 改为 MobileNetV1Enhance，保持和 Teacher 一致，或者使用完全相同的结构参数。

或者选用PP-OCRv3官方推荐的轻量结构组合，比如都使用：

Backbone:
name: MobileNetV3
scale: 0.5
last_conv_stride: [1, 2]
last_pool_type: avg
last_pool_kernel_size: [2, 2]

Backbone结构即便相同，scale 的不同也可能影响特征宽度（即时间步长度）。

设置 return_all_feats 为 False

在你的配置中，Teacher 和 Student 都设置了 return_all_feats: true，这会导致输出多个特征分支，而蒸馏模块默认从多个head中选取 dis_head='ctc' 或 dis_head='sar' 的输出。若输出tensor shape不匹配，直接计算KL散度就会报错。可以尝试设置为 False，仅固定输出主干结果，更好保持一致性。

调整蒸馏loss dis_head设置或添加 resize 操作

如果Teacher输出特征过长（如80），而Student为40，可在蒸馏Loss的实现文件 distillation_loss.py 或 basic_loss.py 中加入nn.functional.interpolate将两者统一维度。

例如，在forward前加入：

if x.shape[1] != target.shape[1]:
target = paddle.nn.functional.interpolate(target, size=[x.shape[1], x.shape[2]], mode='bilinear')

不过这是临时解决办法，建议最优先保持模型结构一致性。

使用 PaddleOCR 官方推荐结构组合

可以参考官方PaddleOCR仓库推荐配置：

ch_PP-OCRv3_rec_distillation.yml；
ch_PP-OCRv3_rec_mobile_train.yml (学生模型)；
ch_PP-OCRv3_rec_server_train.yml (教师模型)；

或直接对照 configs/rec/rec_distillation 的示例，参考结构和预训练权重路径，进一步检查是否是预训练模型与当前结构不兼容导致问题。

📎 建议：

你现在使用的结构，是你自定义替换了学生模型的Backbone为MobileNetV3。建议将其改回与Teacher一致的MobileNetV1Enhance，并保持模型的Neck和Head参数一致。统一结构后，再进行训练，Broadcast错误应该可解决。

如果仍有问题，可使用以下Debug方法：

在distillation_loss.py中，forward前打印 out1[self.dis_head].shape 和 out2[self.dis_head].shape；
使用 paddle.summary(model) 或打印每个模块输出看结构是否兼容；
尝试只用CTCHead，不使用SARHead，精简模型结构以便调试。

如需参考官方相关配置，可看：
https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6/configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml

希望上述建议能帮你解决问题。如仍无法解决，欢迎继续提供最新日志和配置信息。

Response generated by 🤖 feifei-bot | chatgpt-4o-latest

⚠️ This is an AI-generated response. Please verify any critical information.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PaddleOCR训练量化蒸馏模型报错 Broadcast dimension mismatch. Operands could not be broadcast together with the shape of X = [64, 40, 12] and the shape of Y = [64, 80, 12]. Received [40] in X is not equal to [80] in Y at i:1 #14981

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

PaddleOCR训练量化蒸馏模型报错 Broadcast dimension mismatch. Operands could not be broadcast together with the shape of X = [64, 40, 12] and the shape of Y = [64, 80, 12]. Received [40] in X is not equal to [80] in Y at i:1 #14981

Uh oh!

leesimeng Apr 7, 2025

Replies: 1 comment

Uh oh!

GreatV Apr 7, 2025 Maintainer

leesimeng
Apr 7, 2025

GreatV
Apr 7, 2025
Maintainer