中文v4版本 训练途中iter acc 很高 实际辨识会有多字、漏字情形 #16109
Unanswered
ZiHenCheng
asked this question in
Q&A
Replies: 2 comments 5 replies
-
有任何需要我补充的资料请各位跟我讲 |
Beta Was this translation helpful? Give feedback.
5 replies
-
从训练log看你的评估指标仅为40%多,所以出现识别错误是正常的,而训练过程中acc较大是由于这二者的计算方式不一样,训练中的acc指的是该批次精度,而不是全局精度 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
paddleocr 2.6
paddlepaddle-gpu 2.4.2.post117
经常性的辨识多字、漏字
目前训练集已有14000张+ (字典有变更)
在训练途中经常性出现
[2025/07/22 12:17:05] ppocr INFO: epoch: [198/200], global_step: 4750, lr: 0.000001, acc: 0.999998, norm_edit_dis: 1.000000, CTCLoss: 0.011095, NRTRLoss: 1.208231, loss: 1.219588, avg_reader_cost: 0.00014 s, avg_batch_cost: 0.17315 s, avg_samples: 9.2, ips: 53.13352 samples/s, eta: 0:00:10, max_mem_reserved: 3786 MB, max_mem_allocated: 3491 MB
eval model:: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 74/74 [00:06<00:00, 11.35it/s]
[2025/07/22 12:17:11] ppocr INFO: cur metric, acc: 0.4185848216659436, norm_edit_dis: 0.8920313446664886, fps: 210.18005090029894
[2025/07/22 12:17:11] ppocr INFO: best metric, acc: 0.4671781716353097, is_float16: False, norm_edit_dis: 0.9008529489931351, fps: 210.71439580887272, best_epoch: 92
甚至是Train & Eval都已经相同资料集与gt.txt
都还是无法达到acc 0.9X
而实际使用rec辨识时会有多字情形 ex, ans : 140007 ocr : 1400007
是否有人遇过相同情形
yaml :
Global:
debug: false # 是否开启 debug 模式,建议关闭以提升效能
use_gpu: true # 是否使用 GPU 训练(也可为 false,使用 CPU 训练)
epoch_num: 100 # 总训练轮数(也可设大些如 300,依资料集大小决定)
log_smooth_window: 20 # 用于平滑 log 显示的 batch 数量,选项为任意整数
print_batch_step: 10 # 每 N 个 batch 显示一次训练资讯,任意正整数
save_model_dir: ./output/XXXXXXXXXXXXX # 模型储存资料夹,可自订
save_epoch_step: 50 # 每 N 个 epoch 储存模型,选项为任意整数
eval_batch_step: [0, 20] # [开始评估步数, 每 N 步评估一次],格式为 [int, int]
cal_metric_during_train: true # 训练时是否即时计算 acc(true / false)
pretrained_model: XXXXXXXXXXXXX # 可为空代表不使用
checkpoints: # 若接续训练可填最新模型目录,否则留空
save_inference_dir: # 用于导出推论模型的资料夹,可填 ./inference_model
use_visualdl: true # 是否开启 VisualDL(Paddle 版 tensorboard)
infer_img: doc/imgs_words/ch/word_1.jpg # 推论测试的范例图,可自订
character_dict_path: XXXXXXXXXXXXX.txt # 字典档,可自行定义内容
max_text_length: &max_text_length 50 # 最大预期辨识长度,会影响 loss 与预测长度
infer_mode: false # 推论模式(推论任务才打开)
use_space_char: true # 是否包含空白作为可辨识字元
distributed: false # 是否使用多卡训练(单卡设 false)
save_res_path: ./output/rec/predicts_ppocrv3.txt # 验证/推论结果存档位置
freeze_params:
- Backbone
- Head.0.Neck
Optimizer:
name: AdamW # 支援选项有:Adam、AdamW、SGD、RMSProp
beta1: 0.9 # 适用于 Adam 类优化器
beta2: 0.999 # 同上
lr:
name: Cosine # 支援:Cosine、Piecewise、PolynomialDecay、Step、Linear
learning_rate: 0.0003 # 初始学习率(0.001~0.0001 常见)
warmup_epoch: 2 # 预热学习率 epoch 数,视模型大小设定
lr:
name: Piecewise
decay_epochs: [10, 20, 30]
boundaries: [10, 20, 30]
values: [0.001, 0.0003, 0.0001, 0.00005]
regularizer:
name: L2 # 支援:L2、L1
factor: 5.0e-05 # 正则化强度
Architecture:
model_type: rec # 支援:rec(识别)、det(检测)等
algorithm: SVTR_LCNet # 选项有:CRNN、Rosetta、NRTR、SVTR_LCNet、ViTSTR 等 ####
Transform: # 支援:TPS、STN 等,空表示不使用
Backbone:
name: PPLCNetV3 # 选项如:MobileNetV1/V3、ResNet、PPLCNetV3、SVTRNet 等
scale: 0.95 # 模型宽度缩放系数,可为 0.5, 1.0, 0.95 等
Head:
name: MultiHead # 可为 CTCHead、AttentionHead、MultiHead
head_list:
Neck:
name: svtr # 颈部选项如:svtr、rnn
dims: 120
depth: 2
hidden_dims: 120
kernel_size: [1, 3]
use_guide: True
Head:
fc_decay: 0.00001
nrtr_dim: 384
max_text_length: *max_text_length
Loss:
name: MultiLoss
loss_config_list:
weight: 0.5 # 支援:CTCLoss、AttentionLoss、NRTRLoss、SARLoss 等
weight: 0.5
PostProcess:
name: CTCLabelDecode # 若为 Attention 模型则使用 AttnLabelDecode
Metric:
name: RecMetric # 支援:RecMetric、DetMetric
main_indicator: acc # 或 norm_edit_dis(平均编辑距离)
Train:
dataset:
name: MultiScaleDataSet
ds_width: false
data_dir: XXXXXXXXXXXXX
ext_op_transform_idx: 1
label_file_list:
transforms:
img_mode: BGR
channel_first: false
- RecConAug:
prob: 0.3
ext_data_num: 4
image_shape: [3, 48, 640]
max_text_length: *max_text_length
- RecAug:
use_tia: True
aug_prob: 0.8
gtc_encode: NRTRLabelEncode
keep_keys:
sampler:
name: MultiScaleSampler # 或使用 DistributedBatchSampler、BatchSampler
scales: [[640, 24], [640, 32], [640, 40], [640, 48], [640, 56], [640, 64]]
first_bs: &bs 16
fix_bs: false
divided_factor: [8, 16]
is_training: true
loader:
shuffle: true
batch_size_per_card: *bs
drop_last: true
num_workers: 1
Eval:
dataset:
name: SimpleDataSet
data_dir: XXXXXXXXXXXXX
label_file_list:
transforms:
img_mode: BGR
channel_first: false
- MultiLabelEncode:
gtc_encode: NRTRLabelEncode
- RecResizeImg:
image_shape: [3, 48, 640] # 評估固定為單一尺寸
- KeepKeys:
keep_keys:
- image
- label_ctc
- label_gtc
- length
- valid_ratio
loader:
shuffle: false
drop_last: false
batch_size_per_card: 16
num_workers: 1
Beta Was this translation helpful? Give feedback.
All reactions