Replies: 1 comment
-
根据你提供的错误信息: ValueError: Broadcast dimension mismatch. Operands could not be broadcast together with the shape of X = [64, 40, 12] and the shape of Y = [64, 80, 12]. Received [40] in X is not equal to [80] in Y at i:1. 该问题发生在 distillation_loss.py → basic_loss.py → forward 方法中进行 KL 散度计算时,两个 Tensor 的维度不一致,无法进行广播操作。 这通常是由于 Student 模型和 Teacher 模型在对应输出的 shape 不一致导致的,具体在你配置文件中,CTCHead 的输出 shape 中第二维不同:一个是 [64, 40, 12],另一个是 [64, 80, 12],在 axis=1 上不一致(40 != 80)。 📌 问题原因: CTCHead 的输出 shape 是 [B, T, C]:
你的 Teacher 和 Student 模型使用了不同的骨干网络:
由于你设置为使用蒸馏时返回所有特征(return_all_feats: true)并启用了一个多头结构(MultiHead),而且 MultiHead 中 head_list 的顺序保持一致,但它们在 forward 输出 shape 上不同,就会在 KLDiv Loss 中抛出不一致错误,因为不能直接计算 KL(student_logits, teacher_logits)。 📌 解决方案: ✅ 方法一:对齐 Teacher 与 Student 的输出 shape 你需要让 Teacher 和 Student 在蒸馏时输出的 T(时间步长)维度一致。
✅ 方法二:仅蒸馏 shape 一致的 head 或特征 既然你开启的蒸馏损失是多头的,可以在 Loss 配置中暂时移除出现 shape 不一致的 head(比如 sar 或 ctc):
✅ 方法三:检查图像输入大小确保一致性传播 你在数据预处理中设置了:
请确认模型 head 的 downsample 率是一样的,否则最终的 T 会因为输入大小及下采样率不同而不一致。 可通过打印 Student 和 Teacher 模型输出的 ctc head logits.shape 进行确认: 可加 debug log: preds["Student"]["head_out"]["ctc"].shape 和 preds["Teacher"]["head_out"]["ctc"].shape 📑 建议参考:
🚨 总结: 你的问题本质是:模型输出的维度 shape 不一致(40 != 80)导致 KLDivLoss 计算失败。 建议:
如需进一步定位,建议临时打印输出各个 head 的 shape 进行对比确认。 如需更有效调试,可以单独运行一批数据,print 出各个 head 的中间输出维度。 Response generated by 🤖 feifei-bot | chatgpt-4o-latest |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
🔎 Search before asking
🐛 Bug (问题描述)
报错如下:
[2025/04/05 14:19:51] ppocr INFO: train from scratch
[2025/04/05 14:19:51] ppocr INFO: train from scratch
[2025/04/05 14:19:51] ppocr INFO: train dataloader has 43 iters, valid dataloader has 8 iters
[2025/04/05 14:19:51] ppocr INFO: During the training process, after the 0th iteration, an evaluation is run every 200 iterations
Traceback (most recent call last):
File "/mnt/workspace/PaddleOCR/deploy/slim/quantization/quant.py", line 226, in
main(config, device, logger, vdl_writer)
File "/mnt/workspace/PaddleOCR/deploy/slim/quantization/quant.py", line 207, in main
program.train(
File "/mnt/workspace/PaddleOCR/tools/program.py", line 367, in train
loss = loss_class(preds, batch)
File "/usr/local/lib/python3.10/site-packages/paddle/nn/layer/layers.py", line 1429, in call
return self.forward(*inputs, **kwargs)
File "/mnt/workspace/PaddleOCR/ppocr/losses/combined_loss.py", line 70, in forward
loss = loss_func(input, batch, **kargs)
File "/usr/local/lib/python3.10/site-packages/paddle/nn/layer/layers.py", line 1429, in call
return self.forward(*inputs, **kwargs)
File "/mnt/workspace/PaddleOCR/ppocr/losses/distillation_loss.py", line 111, in forward
loss = super().forward(out1[self.dis_head], out2[self.dis_head])
File "/mnt/workspace/PaddleOCR/ppocr/losses/basic_loss.py", line 129, in forward
loss = (self._kldiv(log_out1, out2) + self._kldiv(log_out2, out1)) / 2.0
File "/mnt/workspace/PaddleOCR/ppocr/losses/basic_loss.py", line 116, in _kldiv
loss = target * (paddle.log(target + eps) - x)
ValueError: (InvalidArgument) Broadcast dimension mismatch. Operands could not be broadcast together with the shape of X = [64, 40, 12] and the shape of Y = [64, 80, 12]. Received [40] in X is not equal to [80] in Y at i:1.
[Hint: Expected x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1 == true, but received x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1:0 != true:1.] (at /paddle/paddle/phi/kernels/funcs/common_shape.h:73)
🏃♂️ Environment (运行环境)
Python 3.10.14
PaddleOCR 2.10.0
🌰 Minimal Reproducible Example (最小可复现问题的Demo)
训练脚本:
python deploy/slim/quantization/quant.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation_v2.yml
训练配置文件:
Global:
debug: false
use_gpu: true
epoch_num: 400
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/rec_ppocr_v3_distillation
save_epoch_step: 100
eval_batch_step: [0, 200]
cal_metric_during_train: true
pretrained_model:
checkpoints:
save_inference_dir:
use_visualdl: false
infer_img: doc/imgs_words/ch/word_1.jpg
character_dict_path: ppocr/utils/bank_dict.txt
max_text_length: &max_text_length 25
infer_mode: false
use_space_char: true
distributed: true
save_res_path: ./output/rec/predicts_ppocrv3_distillation.txt
d2s_train_image_shape: [3, 48, -1]
Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: Piecewise
decay_epochs : [700]
values : [0.0005, 0.00005]
warmup_epoch: 5
regularizer:
name: L2
factor: 3.0e-05
Architecture:
model_type: &model_type "rec"
name: DistillationModel
algorithm: Distillation
Models:
Teacher:
pretrained: ./lsm_data/pretrained_module/ch_PP-OCRv3_rec_slim_train/best_accuracy
freeze_params: false
return_all_feats: true
model_type: *model_type
algorithm: SVTR_LCNet
Transform:
Backbone:
name: MobileNetV1Enhance
scale: 0.5
last_conv_stride: [1, 2]
last_pool_type: avg
last_pool_kernel_size: [2, 2]
Head:
name: MultiHead
head_list:
- CTCHead:
Neck:
name: svtr
dims: 64
depth: 2
hidden_dims: 120
use_guide: True
Head:
fc_decay: 0.00001
- SARHead:
enc_dim: 512
max_text_length: *max_text_length
Student:
pretrained:
freeze_params: false
return_all_feats: true
model_type: *model_type
algorithm: SVTR_LCNet
Transform:
Backbone:
name: MobileNetV3
scale: 0.5
last_conv_stride: [1, 2]
last_pool_type: avg
last_pool_kernel_size: [2, 2]
Head:
name: MultiHead
head_list:
- CTCHead:
Neck:
name: svtr
dims: 64
depth: 2
hidden_dims: 120
use_guide: True
Head:
fc_decay: 0.00001
- SARHead:
enc_dim: 512
max_text_length: *max_text_length
Loss:
name: CombinedLoss
loss_config_list:
weight: 1.0
act: "softmax"
use_log: true
model_name_pairs:
key: head_out
multi_head: True
dis_head: ctc
name: dml_ctc
weight: 0.5
act: "softmax"
use_log: true
model_name_pairs:
key: head_out
multi_head: True
dis_head: sar
name: dml_sar
weight: 1.0
mode: "l2"
model_name_pairs:
key: backbone_out
weight: 1.0
model_name_list: ["Student", "Teacher"]
key: head_out
multi_head: True
weight: 1.0
model_name_list: ["Student", "Teacher"]
key: head_out
multi_head: True
PostProcess:
name: DistillationCTCLabelDecode
model_name: ["Student", "Teacher"]
key: head_out
multi_head: True
Metric:
name: DistillationMetric
base_metric_name: RecMetric
main_indicator: acc
key: "Student"
ignore_space: False
Train:
dataset:
name: SimpleDataSet
data_dir: ./lsm_data/bank_rec1/train_image
ext_op_transform_idx: 1
label_file_list:
- ./lsm_data/bank_rec1/rec_gt_train.txt
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- RecConAug:
prob: 0.5
ext_data_num: 2
image_shape: [48, 320, 3]
max_text_length: *max_text_length
- RecAug:
- MultiLabelEncode:
- RecResizeImg:
image_shape: [3, 48, 320]
- KeepKeys:
keep_keys:
- image
- label_ctc
- label_sar
- length
- valid_ratio
loader:
shuffle: true
batch_size_per_card: 64
drop_last: true
num_workers: 4
Eval:
dataset:
name: SimpleDataSet
data_dir: ./lsm_data/bank_rec1/test_image
label_file_list:
- ./lsm_data/bank_rec1/rec_gt_test.txt
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- MultiLabelEncode:
- RecResizeImg:
image_shape: [3, 48, 320]
- KeepKeys:
keep_keys:
- image
- label_ctc
- label_sar
- length
- valid_ratio
loader:
shuffle: false
drop_last: false
batch_size_per_card: 64
num_workers: 4
Beta Was this translation helpful? Give feedback.
All reactions