关键信息提取配置文件 #14844
Unanswered
eunij-peanut
asked this question in
Q&A
关键信息提取配置文件
#14844
Replies: 1 comment 1 reply
-
1. 全局参数(Global):
2. 网络结构(Architecture):
3. 损失函数(Loss):
4. 优化器(Optimizer):
5. 后处理(PostProcess):
6. 评估指标(Metric):
7. 训练数据集(Train) & 验证数据集(Eval):
8. 数据加载器参数(loader):
总结
如果需要更详细的官方说明,建议查阅 PaddleOCR KIE 相关文档。 Response generated by 🤖 feifei-bot | chatgpt-4o-latest |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
想请教一下re_vi_layoutxlm_xfund_zh.yml和ser_vi_layoutxlm_xfund_zh.yml两个配置文件的参数具体说明,或者哪里有说明文件吗?
ser_vi_layoutxlm_xfund_zh.yml:
`Global:
use_gpu: True
epoch_num: &epoch_num 1000
log_smooth_window: 10
print_batch_step: 10
save_model_dir: ./output/qc_2_ser
save_epoch_step: 100
evaluation is run every 10 iterations after the 0th iteration
eval_batch_step: [ 0, 29 ] # [ 0, 19 ]
cal_metric_during_train: False
save_inference_dir:
use_visualdl: False
seed: 2025
infer_img: /home/aistudio/work/PaddleOCR/train_data/qc_2/imgs/val
d2s_train_image_shape: [3, 448, 640] # [3, 224, 224]
if you want to predict using the groundtruth ocr info,
you can use the following config
infer_img: /home/aistudio/work/PaddleOCR/train_data/zzsfp/imgs/b25.jpg
infer_mode: False
save_res_path: ./output/ser/qc_2
kie_rec_model_dir:
kie_det_model_dir:
amp_custom_white_list: ['scale', 'concat', 'elementwise_add']
Architecture:
model_type: kie
algorithm: &algorithm "LayoutXLM"
Transform:
Backbone:
name: LayoutXLMForSer
pretrained: True
ignore_mismatched_sizes: True
checkpoints:
# one of base or vi
mode: vi
num_classes: &num_classes 77
Loss:
name: VQASerTokenLayoutLMLoss
num_classes: *num_classes
key: "backbone_out"
Optimizer:
name: AdamW
beta1: 0.9
beta2: 0.999
lr:
name: Linear
learning_rate: 0.00005
epochs: *epoch_num
warmup_epoch: 2
regularizer:
name: L2
factor: 0.00000
PostProcess:
name: VQASerTokenLayoutLMPostProcess
class_path: &class_path /home/aistudio/work/PaddleOCR/train_data/qc_3/class_list.txt
Metric:
name: VQASerTokenMetric
main_indicator: hmean
Train:
dataset:
name: SimpleDataSet
data_dir: /home/aistudio/work/PaddleOCR/train_data/qc_3/imgs/train
label_file_list:
- /home/aistudio/work/PaddleOCR/train_data/qc_3/Label_ser_train_processed.json
ratio_list: [ 1.0 ]
transforms:
- DecodeImage: # load image
img_mode: RGB
channel_first: False
- VQATokenLabelEncode: # Class handling label
contains_re: False
algorithm: *algorithm
class_path: *class_path
use_textline_bbox_info: &use_textline_bbox_info True
# one of [None, "tb-yx"]
order_method: &order_method "tb-yx"
- VQATokenPad:
max_seq_len: &max_seq_len 512
return_attention_mask: True
- VQASerTokenChunk:
max_seq_len: *max_seq_len
- Resize:
size: [448,640]
- NormalizeImage:
scale: 1
mean: [ 123.675, 116.28, 103.53 ]
std: [ 58.395, 57.12, 57.375 ]
order: 'hwc'
- ToCHWImage:
- KeepKeys:
keep_keys: [ 'input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'labels'] # dataloader will return list in this order
loader:
shuffle: True
drop_last: False
batch_size_per_card: 8
num_workers: 0
Eval:
dataset:
name: SimpleDataSet
data_dir: /home/aistudio/work/PaddleOCR/train_data/qc_3/imgs/val
label_file_list:
- /home/aistudio/work/PaddleOCR/train_data/qc_3/Label_ser_val_processed.json
transforms:
- DecodeImage: # load image
img_mode: RGB
channel_first: False
- VQATokenLabelEncode: # Class handling label
contains_re: False
algorithm: *algorithm
class_path: *class_path
use_textline_bbox_info: *use_textline_bbox_info
order_method: *order_method
- VQATokenPad:
max_seq_len: *max_seq_len
return_attention_mask: True
- VQASerTokenChunk:
max_seq_len: *max_seq_len
- Resize:
size: [448,640]
- NormalizeImage:
scale: 1
mean: [ 123.675, 116.28, 103.53 ]
std: [ 58.395, 57.12, 57.375 ]
order: 'hwc'
- ToCHWImage:
- KeepKeys:
keep_keys: [ 'input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'labels'] # dataloader will return list in this order
loader:
shuffle: False
drop_last: False
batch_size_per_card: 8
num_workers: 0
re_vi_layoutxlm_xfund_zh.yml:
Global:use_gpu: True
epoch_num: &epoch_num 200
log_smooth_window: 10
print_batch_step: 10
save_model_dir: /home/aistudio/work/PaddleOCR/output/qc_1_re
save_epoch_step: 30
evaluation is run every 10 iterations after the 0th iteration
eval_batch_step: [ 0, 19 ]
cal_metric_during_train: False
save_inference_dir:
use_visualdl: False
seed: 2022
infer_img: /home/aistudio/work/PaddleOCR/train_data/qc/imgs/quality_feb_4.png
save_res_path: ./output/re/xfund_zh/
kie_rec_model_dir:
kie_det_model_dir:
Architecture:
model_type: kie
algorithm: &algorithm "LayoutXLM"
Transform:
Backbone:
name: LayoutXLMForRe
pretrained: True
mode: vi
checkpoints:
Loss:
name: LossFromOutput
key: loss
reduction: mean
Optimizer:
name: AdamW
beta1: 0.9
beta2: 0.999
clip_norm: 10
lr:
learning_rate: 0.00005 # raw:0.00005
warmup_epoch: 10
regularizer:
name: L2
factor: 0.00000
PostProcess:
name: VQAReTokenLayoutLMPostProcess
Metric:
name: VQAReTokenMetric
main_indicator: hmean
Train:
dataset:
name: SimpleDataSet
data_dir: /home/aistudio/work/PaddleOCR/train_data/qc/imgs
label_file_list:
- /home/aistudio/work/PaddleOCR/train_data/qc/train_kie.json
ratio_list: [ 1.0 ]
transforms:
- DecodeImage: # load image
img_mode: RGB
channel_first: False
- VQATokenLabelEncode: # Class handling label
contains_re: True
algorithm: *algorithm
class_path: &class_path /home/aistudio/work/PaddleOCR/train_data/qc/class_list.txt
use_textline_bbox_info: &use_textline_bbox_info True
order_method: &order_method "tb-yx"
- VQATokenPad:
max_seq_len: &max_seq_len 512
return_attention_mask: True
- VQAReTokenRelation:
- VQAReTokenChunk:
max_seq_len: *max_seq_len
- TensorizeEntitiesRelations:
- Resize:
size: [224,224]
- NormalizeImage:
scale: 1
mean: [ 123.675, 116.28, 103.53 ]
std: [ 58.395, 57.12, 57.375 ]
order: 'hwc'
- ToCHWImage:
- KeepKeys:
keep_keys: [ 'input_ids', 'bbox','attention_mask', 'token_type_ids', 'entities', 'relations'] # dataloader will return list in this order
loader:
shuffle: True
drop_last: False
batch_size_per_card: 32
num_workers: 0
Eval:
dataset:
name: SimpleDataSet
data_dir: /home/aistudio/work/PaddleOCR/train_data/qc/imgs
label_file_list:
- /home/aistudio/work/PaddleOCR/train_data/qc/val_kie.json
transforms:
- DecodeImage: # load image
img_mode: RGB
channel_first: False
- VQATokenLabelEncode: # Class handling label
contains_re: True
algorithm: *algorithm
class_path: *class_path
use_textline_bbox_info: *use_textline_bbox_info
order_method: *order_method
- VQATokenPad:
max_seq_len: *max_seq_len
return_attention_mask: True
- VQAReTokenRelation:
- VQAReTokenChunk:
max_seq_len: *max_seq_len
- TensorizeEntitiesRelations:
- Resize:
size: [224,224]
- NormalizeImage:
scale: 1
mean: [ 123.675, 116.28, 103.53 ]
std: [ 58.395, 57.12, 57.375 ]
order: 'hwc'
- ToCHWImage:
- KeepKeys:
keep_keys: [ 'input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'entities', 'relations'] # dataloader will return list in this order
loader:
shuffle: False
drop_last: False
batch_size_per_card: 32
num_workers: 0
`
Beta Was this translation helpful? Give feedback.
All reactions