利用TAL_OCR_MATH小学算数公式数据集进行微调相关问题 #14491
Unanswered
Bestboy125
asked this question in
Q&A
Replies: 1 comment 2 replies
-
从您的描述来看,微调的过程中,数学符号无法正确识别的原因可能有以下几点: 原因分析
解决方案根据上述可能的原因,您可以尝试以下方法来解决问题: 1. 检查字典和映射的正确性
2. 扩充数据集
3. 尝试重新训练模型
4. 调整微调配置
5. 优化损失函数
6. 检查训练日志和可视化
示例调整(针对配置文件)以下是针对您的 # 学习率调整
Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: Cosine
learning_rate: 0.001 # 提高学习率
warmup_epoch: 5
# 数据扩增
Train:
dataset:
transforms:
- RecConAug:
prob: 0.7 # 增加数据增强概率
image_shape: [48, 320, 3]
- RecAug:
aug_prob: 0.5
# 后处理参数
PostProcess:
name: CTCLabelDecode
ctc_beam_search: True # 启用 Beam Search 解码 总结
希望这些方法能帮助您解决问题! Response generated by feifei-bot | chatgpt-4o-latest |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
数据集格式如下:


其中诸如(,),x,÷,余号的数学符号,该数据集进行了字符的映射,我将这些映射直接作为GT和字典。数据量有3万条,识别模型微调后推理结果大部分数学符号都直接空过,只有数字以及-=+这些没有映射的符号能识别出来。
请问这种情况下,我是否应该重新训练模型而不是微调,还是说我的微调过程出了问题呢,
以下是我的字典和识别模型训练集的标签
dict.txt
rec_gt_train.txt
以下是train的配置文件:
Global:
debug: false
use_gpu: true
epoch_num: 200
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/no_f_math_paddle_v4
save_epoch_step: 10
eval_batch_step: [0, 2000]
cal_metric_during_train: true
pretrained_model: /opt/data/private/envs/paddle_ocr/ch_PP-OCRv4_rec_train/student.pdparams
checkpoints:
save_inference_dir:
use_visualdl: false
infer_img: doc/imgs_words/ch/word_1.jpg
character_dict_path: /opt/data/private/envs/paddle_ocr/PaddleOCR/dict.txt
max_text_length: &max_text_length 25
infer_mode: false
use_space_char: true
distributed: true
save_res_path: ./output/rec/predicts_ppocrv3.txt
d2s_train_image_shape: [3, 48, 320]
Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: Cosine
learning_rate: 0.0001
warmup_epoch: 5
regularizer:
name: L2
factor: 3.0e-05
Architecture:
model_type: rec
algorithm: SVTR_LCNet
Transform:
Backbone:
name: PPLCNetV3
scale: 0.95
Head:
name: MultiHead
head_list:
- CTCHead:
Neck:
name: svtr
dims: 120
depth: 2
hidden_dims: 120
kernel_size: [1, 3]
use_guide: True
Head:
fc_decay: 0.00001
- NRTRHead:
nrtr_dim: 384
max_text_length: *max_text_length
Loss:
name: MultiLoss
loss_config_list:
- CTCLoss:
- NRTRLoss:
PostProcess:
name: CTCLabelDecode
Metric:
name: RecMetric
main_indicator: acc
Train:
dataset:
name: MultiScaleDataSet
ds_width: false
data_dir: ./train_data/rec
ext_op_transform_idx: 1
label_file_list:
- /opt/data/private/envs/paddle_ocr/PaddleOCR/train_data/rec/rec_gt_train_no_f.txt
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- RecConAug:
prob: 0.5
ext_data_num: 2
image_shape: [48, 320, 3]
max_text_length: *max_text_length
- RecAug:
- MultiLabelEncode:
gtc_encode: NRTRLabelEncode
- KeepKeys:
keep_keys:
- image
- label_ctc
- label_gtc
- length
- valid_ratio
sampler:
name: MultiScaleSampler
scales: [[320, 32], [320, 48], [320, 64]]
first_bs: &bs 192
fix_bs: false
divided_factor: [8, 16] # w, h
is_training: True
loader:
shuffle: true
batch_size_per_card: *bs
drop_last: true
num_workers: 8
Eval:
dataset:
name: SimpleDataSet
data_dir: ./train_data
label_file_list:
- /opt/data/private/envs/paddle_ocr/PaddleOCR/train_data/rec/rec_gt_train_no_f.txt
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- MultiLabelEncode:
gtc_encode: NRTRLabelEncode
- RecResizeImg:
image_shape: [3, 48, 320]
- KeepKeys:
keep_keys:
- image
- label_ctc
- label_gtc
- length
- valid_ratio
loader:
shuffle: false
drop_last: false
batch_size_per_card: 128
num_workers: 4
Beta Was this translation helpful? Give feedback.
All reactions