PaddleOCR训练量化蒸馏模型报错 Broadcast dimension mismatch. Operands could not be broadcast together with the shape of X = [64, 40, 12] and the shape of Y = [64, 80, 12]. Received [40] in X is not equal to [80] in Y at i:1 #15047

leesimeng · 2025-04-05T06:36:29Z

leesimeng
Apr 5, 2025

🔎 Search before asking

I have searched the PaddleOCR Docs and found no similar bug report.
I have searched the PaddleOCR Issues and found no similar bug report.
I have searched the PaddleOCR Discussions and found no similar bug report.

🐛 Bug (问题描述)

报错如下：
[2025/04/05 14:19:51] ppocr INFO: train from scratch
[2025/04/05 14:19:51] ppocr INFO: train from scratch
[2025/04/05 14:19:51] ppocr INFO: train dataloader has 43 iters, valid dataloader has 8 iters
[2025/04/05 14:19:51] ppocr INFO: During the training process, after the 0th iteration, an evaluation is run every 200 iterations
Traceback (most recent call last):
File "/mnt/workspace/PaddleOCR/deploy/slim/quantization/quant.py", line 226, in
main(config, device, logger, vdl_writer)
File "/mnt/workspace/PaddleOCR/deploy/slim/quantization/quant.py", line 207, in main
program.train(
File "/mnt/workspace/PaddleOCR/tools/program.py", line 367, in train
loss = loss_class(preds, batch)
File "/usr/local/lib/python3.10/site-packages/paddle/nn/layer/layers.py", line 1429, in call
return self.forward(*inputs, **kwargs)
File "/mnt/workspace/PaddleOCR/ppocr/losses/combined_loss.py", line 70, in forward
loss = loss_func(input, batch, **kargs)
File "/usr/local/lib/python3.10/site-packages/paddle/nn/layer/layers.py", line 1429, in call
return self.forward(*inputs, **kwargs)
File "/mnt/workspace/PaddleOCR/ppocr/losses/distillation_loss.py", line 111, in forward
loss = super().forward(out1[self.dis_head], out2[self.dis_head])
File "/mnt/workspace/PaddleOCR/ppocr/losses/basic_loss.py", line 129, in forward
loss = (self._kldiv(log_out1, out2) + self._kldiv(log_out2, out1)) / 2.0
File "/mnt/workspace/PaddleOCR/ppocr/losses/basic_loss.py", line 116, in _kldiv
loss = target * (paddle.log(target + eps) - x)
ValueError: (InvalidArgument) Broadcast dimension mismatch. Operands could not be broadcast together with the shape of X = [64, 40, 12] and the shape of Y = [64, 80, 12]. Received [40] in X is not equal to [80] in Y at i:1.
[Hint: Expected x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1 == true, but received x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1:0 != true:1.] (at /paddle/paddle/phi/kernels/funcs/common_shape.h:73)

🏃‍♂️ Environment (运行环境)

Python 3.10.14
PaddleOCR 2.10.0

🌰 Minimal Reproducible Example (最小可复现问题的Demo)

训练脚本：
python deploy/slim/quantization/quant.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation_v2.yml

训练配置文件：
Global:
debug: false
use_gpu: true
epoch_num: 400
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/rec_ppocr_v3_distillation
save_epoch_step: 100
eval_batch_step: [0, 200]
cal_metric_during_train: true
pretrained_model:
checkpoints:
save_inference_dir:
use_visualdl: false
infer_img: doc/imgs_words/ch/word_1.jpg
character_dict_path: ppocr/utils/bank_dict.txt
max_text_length: &max_text_length 25
infer_mode: false
use_space_char: true
distributed: true
save_res_path: ./output/rec/predicts_ppocrv3_distillation.txt
d2s_train_image_shape: [3, 48, -1]

Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: Piecewise
decay_epochs : [700]
values : [0.0005, 0.00005]
warmup_epoch: 5
regularizer:
name: L2
factor: 3.0e-05

Architecture:
model_type: &model_type "rec"
name: DistillationModel
algorithm: Distillation
Models:
Teacher:
pretrained: ./lsm_data/pretrained_module/ch_PP-OCRv3_rec_slim_train/best_accuracy
freeze_params: false
return_all_feats: true
model_type: *model_type
algorithm: SVTR_LCNet
Transform:
Backbone:
name: MobileNetV1Enhance
scale: 0.5
last_conv_stride: [1, 2]
last_pool_type: avg
last_pool_kernel_size: [2, 2]
Head:
name: MultiHead
head_list:
- CTCHead:
Neck:
name: svtr
dims: 64
depth: 2
hidden_dims: 120
use_guide: True
Head:
fc_decay: 0.00001
- SARHead:
enc_dim: 512
max_text_length: *max_text_length
Student:
pretrained:
freeze_params: false
return_all_feats: true
model_type: *model_type
algorithm: SVTR_LCNet
Transform:
Backbone:
name: MobileNetV3
scale: 0.5
last_conv_stride: [1, 2]
last_pool_type: avg
last_pool_kernel_size: [2, 2]
Head:
name: MultiHead
head_list:
- CTCHead:
Neck:
name: svtr
dims: 64
depth: 2
hidden_dims: 120
use_guide: True
Head:
fc_decay: 0.00001
- SARHead:
enc_dim: 512
max_text_length: *max_text_length
Loss:
name: CombinedLoss
loss_config_list:

DistillationDMLLoss:
weight: 1.0
act: "softmax"
use_log: true
model_name_pairs:
- ["Student", "Teacher"]
  key: head_out
  multi_head: True
  dis_head: ctc
  name: dml_ctc
DistillationDMLLoss:
weight: 0.5
act: "softmax"
use_log: true
model_name_pairs:
- ["Student", "Teacher"]
  key: head_out
  multi_head: True
  dis_head: sar
  name: dml_sar
DistillationDistanceLoss:
weight: 1.0
mode: "l2"
model_name_pairs:
- ["Student", "Teacher"]
  key: backbone_out
DistillationCTCLoss:
weight: 1.0
model_name_list: ["Student", "Teacher"]
key: head_out
multi_head: True
DistillationSARLoss:
weight: 1.0
model_name_list: ["Student", "Teacher"]
key: head_out
multi_head: True

PostProcess:
name: DistillationCTCLabelDecode
model_name: ["Student", "Teacher"]
key: head_out
multi_head: True

Metric:
name: DistillationMetric
base_metric_name: RecMetric
main_indicator: acc
key: "Student"
ignore_space: False

Train:
dataset:
name: SimpleDataSet
data_dir: ./lsm_data/bank_rec1/train_image
ext_op_transform_idx: 1
label_file_list:
- ./lsm_data/bank_rec1/rec_gt_train.txt
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- RecConAug:
prob: 0.5
ext_data_num: 2
image_shape: [48, 320, 3]
max_text_length: *max_text_length
- RecAug:
- MultiLabelEncode:
- RecResizeImg:
image_shape: [3, 48, 320]
- KeepKeys:
keep_keys:
- image
- label_ctc
- label_sar
- length
- valid_ratio
loader:
shuffle: true
batch_size_per_card: 64
drop_last: true
num_workers: 4
Eval:
dataset:
name: SimpleDataSet
data_dir: ./lsm_data/bank_rec1/test_image
label_file_list:
- ./lsm_data/bank_rec1/rec_gt_test.txt
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- MultiLabelEncode:
- RecResizeImg:
image_shape: [3, 48, 320]
- KeepKeys:
keep_keys:
- image
- label_ctc
- label_sar
- length
- valid_ratio
loader:
shuffle: false
drop_last: false
batch_size_per_card: 64
num_workers: 4

GreatV · 2025-04-19T14:12:01Z

GreatV
Apr 19, 2025
Maintainer

根据你提供的错误信息：

ValueError: Broadcast dimension mismatch. Operands could not be broadcast together with the shape of X = [64, 40, 12] and the shape of Y = [64, 80, 12]. Received [40] in X is not equal to [80] in Y at i:1.

该问题发生在 distillation_loss.py → basic_loss.py → forward 方法中进行 KL 散度计算时，两个 Tensor 的维度不一致，无法进行广播操作。

这通常是由于 Student 模型和 Teacher 模型在对应输出的 shape 不一致导致的，具体在你配置文件中，CTCHead 的输出 shape 中第二维不同：一个是 [64, 40, 12]，另一个是 [64, 80, 12]，在 axis=1 上不一致（40 != 80）。

📌 问题原因：

CTCHead 的输出 shape 是 [B, T, C]：

B: batch 大小。
T: 序列长度（通常与图像宽度和 model 的下采样率有关）。
C: 类别数。

你的 Teacher 和 Student 模型使用了不同的骨干网络：

Teacher 使用 MobileNetV1Enhance，Student 使用 MobileNetV3。
两者结构不同，导致最终 CTCHead 输出的 T 维度不同。

由于你设置为使用蒸馏时返回所有特征（return_all_feats: true）并启用了一个多头结构（MultiHead），而且 MultiHead 中 head_list 的顺序保持一致，但它们在 forward 输出 shape 上不同，就会在 KLDiv Loss 中抛出不一致错误，因为不能直接计算 KL(student_logits, teacher_logits)。

📌 解决方案：

✅ 方法一：对齐 Teacher 与 Student 的输出 shape

你需要让 Teacher 和 Student 在蒸馏时输出的 T（时间步长）维度一致。

对比两个模型推理阶段输出中 ctc head 的 shape，如果不一致，进行对齐。
方法包括：
1. 将两个模型 Backbone 的下采样策略保持一致（如 MobileNetV1Enhance 与 MobileNetV3 使用相同的 stride 和 pooling 结构）。
2. 修改 Student 和 Teacher 的图像输入方式，统一输入大小。
3. 强制统一 MultiHead 的中间 neck 输出形状（例如加 AdaptiveAvgPool2D）— 这需小心改动网络结构。

✅ 方法二：仅蒸馏 shape 一致的 head 或特征

既然你开启的蒸馏损失是多头的，可以在 Loss 配置中暂时移除出现 shape 不一致的 head（比如 sar 或 ctc）：

修改 loss 配置文件 DissillationDMLLoss 中 dis_head 的使用，只保留能对齐 shape 的 head。
例如，你可以关闭 dml_ctc，仅使用 sar 头进行蒸馏（或反过来）。

✅ 方法三：检查图像输入大小确保一致性传播

你在数据预处理中设置了：

Train 阶段图像大小：RecResizeImg → [3, 48, 320]
Eval 阶段一致。

请确认模型 head 的 downsample 率是一样的，否则最终的 T 会因为输入大小及下采样率不同而不一致。

可通过打印 Student 和 Teacher 模型输出的 ctc head logits.shape 进行确认：

可加 debug log：

preds["Student"]["head_out"]["ctc"].shape 和 preds["Teacher"]["head_out"]["ctc"].shape

📑 建议参考：

PaddleOCR 官方文档的知识蒸馏部分：https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6/doc/doc_ch/slim/distillation.md#rec%E5%AD%A6%E7%9A%84%E5%8F%82%E6%95%B0%E8%AF%B4%E6%98%8E
类似问题在 GitHub 中也有讨论（但未提供解决），例如：Revert to fix the OOM of Argsort Paddle#53643

🚨 总结：

你的问题本质是：模型输出的维度 shape 不一致（40 != 80）导致 KLDivLoss 计算失败。

建议：

逐步对齐 Teacher 与 Student 的 ctc head 输出维度；
或只蒸馏 shape 一致的部分（比如改为只蒸馏 sar head）；
或改动结构使其 shape 一致；

如需进一步定位，建议临时打印输出各个 head 的 shape 进行对比确认。

如需更有效调试，可以单独运行一批数据，print 出各个 head 的中间输出维度。

Response generated by 🤖 feifei-bot | chatgpt-4o-latest

⚠️ This is an AI-generated response. Please verify any critical information.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PaddleOCR训练量化蒸馏模型报错 Broadcast dimension mismatch. Operands could not be broadcast together with the shape of X = [64, 40, 12] and the shape of Y = [64, 80, 12]. Received [40] in X is not equal to [80] in Y at i:1 #15047

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

PaddleOCR训练量化蒸馏模型报错 Broadcast dimension mismatch. Operands could not be broadcast together with the shape of X = [64, 40, 12] and the shape of Y = [64, 80, 12]. Received [40] in X is not equal to [80] in Y at i:1 #15047

Uh oh!

leesimeng Apr 5, 2025

🔎 Search before asking

🐛 Bug (问题描述)

🏃‍♂️ Environment (运行环境)

🌰 Minimal Reproducible Example (最小可复现问题的Demo)

Replies: 1 comment

Uh oh!

GreatV Apr 19, 2025 Maintainer

leesimeng
Apr 5, 2025

GreatV
Apr 19, 2025
Maintainer