Replies: 1 comment
-
你遇到的报错信息是: Broadcast dimension mismatch. Operands could not be broadcast together with the shape of X = [64, 40, 12] and the shape of Y = [64, 80, 12]. Received [40] in X is not equal to [80] in Y at i:1 这说明在蒸馏损失计算中(如DistillationDMLLoss),教师与学生模型的某一部分输出形状不一致,具体在维度1上,学生模型为40,而教师模型为80,导致计算如KL散度时报错。 该问题通常出现在使用MultiHead(多头)结构的识别模型中,由于配置的教师和学生模型结构不同,导致输出shape不一致。 🔍 错误定位: 根据你的训练配置文件,学生模型使用的是: Backbone: MobileNetV3, 虽然它们都使用相同的Neck(svtr)和Head(MultiHead)结构,但由于Backbone不同,它们产生的feature map的维度可能不同,尤其在CTCHead部分。 结合报错信息: 可能原因:
📌 解决方案: 解决思路是确保Student和Teacher模型在head输出(至少是定义为distillation loss中key=‘head_out’和dis_head=‘ctc’的输出)对应的维度一致。可尝试以下途径:
建议你将 Student 的 backbone 从 MobileNetV3 改为 MobileNetV1Enhance,保持和 Teacher 一致,或者使用完全相同的结构参数。 或者选用PP-OCRv3官方推荐的轻量结构组合,比如都使用: Backbone: Backbone结构即便相同,scale 的不同也可能影响特征宽度(即时间步长度)。
在你的配置中,Teacher 和 Student 都设置了 return_all_feats: true,这会导致输出多个特征分支,而蒸馏模块默认从多个head中选取 dis_head='ctc' 或 dis_head='sar' 的输出。若输出tensor shape不匹配,直接计算KL散度就会报错。可以尝试设置为 False,仅固定输出主干结果,更好保持一致性。
如果Teacher输出特征过长(如80),而Student为40,可在蒸馏Loss的实现文件 distillation_loss.py 或 basic_loss.py 中加入nn.functional.interpolate将两者统一维度。 例如,在forward前加入: if x.shape[1] != target.shape[1]: 不过这是临时解决办法,建议最优先保持模型结构一致性。
可以参考官方PaddleOCR仓库推荐配置:
或直接对照 configs/rec/rec_distillation 的示例,参考结构和预训练权重路径,进一步检查是否是预训练模型与当前结构不兼容导致问题。 📎 建议: 你现在使用的结构,是你自定义替换了学生模型的Backbone为MobileNetV3。建议将其改回与Teacher一致的MobileNetV1Enhance,并保持模型的Neck和Head参数一致。统一结构后,再进行训练,Broadcast错误应该可解决。 如果仍有问题,可使用以下Debug方法:
如需参考官方相关配置,可看: 希望上述建议能帮你解决问题。如仍无法解决,欢迎继续提供最新日志和配置信息。 Response generated by 🤖 feifei-bot | chatgpt-4o-latest |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
报错如下:
[2025/04/05 14:19:51] ppocr INFO: train from scratch
[2025/04/05 14:19:51] ppocr INFO: train from scratch
[2025/04/05 14:19:51] ppocr INFO: train dataloader has 43 iters, valid dataloader has 8 iters
[2025/04/05 14:19:51] ppocr INFO: During the training process, after the 0th iteration, an evaluation is run every 200 iterations
Traceback (most recent call last):
File "/mnt/workspace/PaddleOCR/deploy/slim/quantization/quant.py", line 226, in
main(config, device, logger, vdl_writer)
File "/mnt/workspace/PaddleOCR/deploy/slim/quantization/quant.py", line 207, in main
program.train(
File "/mnt/workspace/PaddleOCR/tools/program.py", line 367, in train
loss = loss_class(preds, batch)
File "/usr/local/lib/python3.10/site-packages/paddle/nn/layer/layers.py", line 1429, in call
return self.forward(*inputs, **kwargs)
File "/mnt/workspace/PaddleOCR/ppocr/losses/combined_loss.py", line 70, in forward
loss = loss_func(input, batch, **kargs)
File "/usr/local/lib/python3.10/site-packages/paddle/nn/layer/layers.py", line 1429, in call
return self.forward(*inputs, **kwargs)
File "/mnt/workspace/PaddleOCR/ppocr/losses/distillation_loss.py", line 111, in forward
loss = super().forward(out1[self.dis_head], out2[self.dis_head])
File "/mnt/workspace/PaddleOCR/ppocr/losses/basic_loss.py", line 129, in forward
loss = (self._kldiv(log_out1, out2) + self._kldiv(log_out2, out1)) / 2.0
File "/mnt/workspace/PaddleOCR/ppocr/losses/basic_loss.py", line 116, in _kldiv
loss = target * (paddle.log(target + eps) - x)
ValueError: (InvalidArgument) Broadcast dimension mismatch. Operands could not be broadcast together with the shape of X = [64, 40, 12] and the shape of Y = [64, 80, 12]. Received [40] in X is not equal to [80] in Y at i:1.
[Hint: Expected x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1 == true, but received x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1:0 != true:1.] (at /paddle/paddle/phi/kernels/funcs/common_shape.h:73)
运行环境:
Python 3.10.14
PaddleOCR 2.10.0
训练脚本:
python deploy/slim/quantization/quant.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation_v2.yml
训练配置文件:
Global:
debug: false
use_gpu: true
epoch_num: 400
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/rec_ppocr_v3_distillation
save_epoch_step: 100
eval_batch_step: [0, 200]
cal_metric_during_train: true
pretrained_model:
checkpoints:
save_inference_dir:
use_visualdl: false
infer_img: doc/imgs_words/ch/word_1.jpg
character_dict_path: ppocr/utils/bank_dict.txt
max_text_length: &max_text_length 25
infer_mode: false
use_space_char: true
distributed: true
save_res_path: ./output/rec/predicts_ppocrv3_distillation.txt
d2s_train_image_shape: [3, 48, -1]
Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: Piecewise
decay_epochs : [700]
values : [0.0005, 0.00005]
warmup_epoch: 5
regularizer:
name: L2
factor: 3.0e-05
Architecture:
model_type: &model_type "rec"
name: DistillationModel
algorithm: Distillation
Models:
Teacher:
pretrained: ./lsm_data/pretrained_module/ch_PP-OCRv3_rec_slim_train/best_accuracy
freeze_params: false
return_all_feats: true
model_type: *model_type
algorithm: SVTR_LCNet
Transform:
Backbone:
name: MobileNetV1Enhance
scale: 0.5
last_conv_stride: [1, 2]
last_pool_type: avg
last_pool_kernel_size: [2, 2]
Head:
name: MultiHead
head_list:
Neck:
name: svtr
dims: 64
depth: 2
hidden_dims: 120
use_guide: True
Head:
fc_decay: 0.00001
enc_dim: 512
max_text_length: *max_text_length
Student:
pretrained:
freeze_params: false
return_all_feats: true
model_type: *model_type
algorithm: SVTR_LCNet
Transform:
Backbone:
name: MobileNetV3
scale: 0.5
last_conv_stride: [1, 2]
last_pool_type: avg
last_pool_kernel_size: [2, 2]
Head:
name: MultiHead
head_list:
Neck:
name: svtr
dims: 64
depth: 2
hidden_dims: 120
use_guide: True
Head:
fc_decay: 0.00001
enc_dim: 512
max_text_length: *max_text_length
Loss:
name: CombinedLoss
loss_config_list:
DistillationDMLLoss:
weight: 1.0
act: "softmax"
use_log: true
model_name_pairs:
["Student", "Teacher"]
key: head_out
multi_head: True
dis_head: ctc
name: dml_ctc
DistillationDMLLoss:
weight: 0.5
act: "softmax"
use_log: true
model_name_pairs:
["Student", "Teacher"]
key: head_out
multi_head: True
dis_head: sar
name: dml_sar
DistillationDistanceLoss:
weight: 1.0
mode: "l2"
model_name_pairs:
["Student", "Teacher"]
key: backbone_out
DistillationCTCLoss:
weight: 1.0
model_name_list: ["Student", "Teacher"]
key: head_out
multi_head: True
DistillationSARLoss:
weight: 1.0
model_name_list: ["Student", "Teacher"]
key: head_out
multi_head: True
PostProcess:
name: DistillationCTCLabelDecode
model_name: ["Student", "Teacher"]
key: head_out
multi_head: True
Metric:
name: DistillationMetric
base_metric_name: RecMetric
main_indicator: acc
key: "Student"
ignore_space: False
Train:
dataset:
name: SimpleDataSet
data_dir: ./lsm_data/bank_rec1/train_image
ext_op_transform_idx: 1
label_file_list:
transforms:
img_mode: BGR
channel_first: false
prob: 0.5
ext_data_num: 2
image_shape: [48, 320, 3]
max_text_length: *max_text_length
image_shape: [3, 48, 320]
keep_keys:
loader:
shuffle: true
batch_size_per_card: 64
drop_last: true
num_workers: 4
Eval:
dataset:
name: SimpleDataSet
data_dir: ./lsm_data/bank_rec1/test_image
label_file_list:
transforms:
img_mode: BGR
channel_first: false
image_shape: [3, 48, 320]
keep_keys:
loader:
shuffle: false
drop_last: false
batch_size_per_card: 64
num_workers: 4
Beta Was this translation helpful? Give feedback.
All reactions