关键信息提取配置文件 #14844

eunij-peanut · 2025-03-12T08:00:51Z

eunij-peanut
Mar 12, 2025

想请教一下re_vi_layoutxlm_xfund_zh.yml和ser_vi_layoutxlm_xfund_zh.yml两个配置文件的参数具体说明，或者哪里有说明文件吗？
ser_vi_layoutxlm_xfund_zh.yml：
`Global:
use_gpu: True
epoch_num: &epoch_num 1000
log_smooth_window: 10
print_batch_step: 10
save_model_dir: ./output/qc_2_ser
save_epoch_step: 100

evaluation is run every 10 iterations after the 0th iteration

eval_batch_step: [ 0, 29 ] # [ 0, 19 ]
cal_metric_during_train: False
save_inference_dir:
use_visualdl: False
seed: 2025
infer_img: /home/aistudio/work/PaddleOCR/train_data/qc_2/imgs/val
d2s_train_image_shape: [3, 448, 640] # [3, 224, 224]

if you want to predict using the groundtruth ocr info,

you can use the following config

infer_img: /home/aistudio/work/PaddleOCR/train_data/zzsfp/imgs/b25.jpg

infer_mode: False

save_res_path: ./output/ser/qc_2
kie_rec_model_dir:
kie_det_model_dir:
amp_custom_white_list: ['scale', 'concat', 'elementwise_add']

Architecture:
model_type: kie
algorithm: &algorithm "LayoutXLM"
Transform:
Backbone:
name: LayoutXLMForSer
pretrained: True
ignore_mismatched_sizes: True
checkpoints:
# one of base or vi
mode: vi
num_classes: &num_classes 77

Loss:
name: VQASerTokenLayoutLMLoss
num_classes: *num_classes
key: "backbone_out"

Optimizer:
name: AdamW
beta1: 0.9
beta2: 0.999
lr:
name: Linear
learning_rate: 0.00005
epochs: *epoch_num
warmup_epoch: 2
regularizer:
name: L2
factor: 0.00000

PostProcess:
name: VQASerTokenLayoutLMPostProcess
class_path: &class_path /home/aistudio/work/PaddleOCR/train_data/qc_3/class_list.txt

Metric:
name: VQASerTokenMetric
main_indicator: hmean

Train:
dataset:
name: SimpleDataSet
data_dir: /home/aistudio/work/PaddleOCR/train_data/qc_3/imgs/train
label_file_list:
- /home/aistudio/work/PaddleOCR/train_data/qc_3/Label_ser_train_processed.json
ratio_list: [ 1.0 ]
transforms:
- DecodeImage: # load image
img_mode: RGB
channel_first: False
- VQATokenLabelEncode: # Class handling label
contains_re: False
algorithm: *algorithm
class_path: *class_path
use_textline_bbox_info: &use_textline_bbox_info True
# one of [None, "tb-yx"]
order_method: &order_method "tb-yx"
- VQATokenPad:
max_seq_len: &max_seq_len 512
return_attention_mask: True
- VQASerTokenChunk:
max_seq_len: *max_seq_len
- Resize:
size: [448,640]
- NormalizeImage:
scale: 1
mean: [ 123.675, 116.28, 103.53 ]
std: [ 58.395, 57.12, 57.375 ]
order: 'hwc'
- ToCHWImage:
- KeepKeys:
keep_keys: [ 'input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'labels'] # dataloader will return list in this order
loader:
shuffle: True
drop_last: False
batch_size_per_card: 8
num_workers: 0

Eval:
dataset:
name: SimpleDataSet
data_dir: /home/aistudio/work/PaddleOCR/train_data/qc_3/imgs/val
label_file_list:
- /home/aistudio/work/PaddleOCR/train_data/qc_3/Label_ser_val_processed.json
transforms:
- DecodeImage: # load image
img_mode: RGB
channel_first: False
- VQATokenLabelEncode: # Class handling label
contains_re: False
algorithm: *algorithm
class_path: *class_path
use_textline_bbox_info: *use_textline_bbox_info
order_method: *order_method
- VQATokenPad:
max_seq_len: *max_seq_len
return_attention_mask: True
- VQASerTokenChunk:
max_seq_len: *max_seq_len
- Resize:
size: [448,640]
- NormalizeImage:
scale: 1
mean: [ 123.675, 116.28, 103.53 ]
std: [ 58.395, 57.12, 57.375 ]
order: 'hwc'
- ToCHWImage:
- KeepKeys:
keep_keys: [ 'input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'labels'] # dataloader will return list in this order
loader:
shuffle: False
drop_last: False
batch_size_per_card: 8
num_workers: 0
re_vi_layoutxlm_xfund_zh.yml：Global:
use_gpu: True
epoch_num: &epoch_num 200
log_smooth_window: 10
print_batch_step: 10
save_model_dir: /home/aistudio/work/PaddleOCR/output/qc_1_re
save_epoch_step: 30

evaluation is run every 10 iterations after the 0th iteration

eval_batch_step: [ 0, 19 ]
cal_metric_during_train: False
save_inference_dir:
use_visualdl: False
seed: 2022
infer_img: /home/aistudio/work/PaddleOCR/train_data/qc/imgs/quality_feb_4.png
save_res_path: ./output/re/xfund_zh/
kie_rec_model_dir:
kie_det_model_dir:

Architecture:
model_type: kie
algorithm: &algorithm "LayoutXLM"
Transform:
Backbone:
name: LayoutXLMForRe
pretrained: True
mode: vi
checkpoints:

Loss:
name: LossFromOutput
key: loss
reduction: mean

Optimizer:
name: AdamW
beta1: 0.9
beta2: 0.999
clip_norm: 10
lr:
learning_rate: 0.00005 # raw:0.00005
warmup_epoch: 10
regularizer:
name: L2
factor: 0.00000

PostProcess:
name: VQAReTokenLayoutLMPostProcess

Metric:
name: VQAReTokenMetric
main_indicator: hmean

Train:
dataset:
name: SimpleDataSet
data_dir: /home/aistudio/work/PaddleOCR/train_data/qc/imgs
label_file_list:
- /home/aistudio/work/PaddleOCR/train_data/qc/train_kie.json
ratio_list: [ 1.0 ]
transforms:
- DecodeImage: # load image
img_mode: RGB
channel_first: False
- VQATokenLabelEncode: # Class handling label
contains_re: True
algorithm: *algorithm
class_path: &class_path /home/aistudio/work/PaddleOCR/train_data/qc/class_list.txt
use_textline_bbox_info: &use_textline_bbox_info True
order_method: &order_method "tb-yx"
- VQATokenPad:
max_seq_len: &max_seq_len 512
return_attention_mask: True
- VQAReTokenRelation:
- VQAReTokenChunk:
max_seq_len: *max_seq_len
- TensorizeEntitiesRelations:
- Resize:
size: [224,224]
- NormalizeImage:
scale: 1
mean: [ 123.675, 116.28, 103.53 ]
std: [ 58.395, 57.12, 57.375 ]
order: 'hwc'
- ToCHWImage:
- KeepKeys:
keep_keys: [ 'input_ids', 'bbox','attention_mask', 'token_type_ids', 'entities', 'relations'] # dataloader will return list in this order
loader:
shuffle: True
drop_last: False
batch_size_per_card: 32
num_workers: 0

Eval:
dataset:
name: SimpleDataSet
data_dir: /home/aistudio/work/PaddleOCR/train_data/qc/imgs
label_file_list:
- /home/aistudio/work/PaddleOCR/train_data/qc/val_kie.json
transforms:
- DecodeImage: # load image
img_mode: RGB
channel_first: False
- VQATokenLabelEncode: # Class handling label
contains_re: True
algorithm: *algorithm
class_path: *class_path
use_textline_bbox_info: *use_textline_bbox_info
order_method: *order_method
- VQATokenPad:
max_seq_len: *max_seq_len
return_attention_mask: True
- VQAReTokenRelation:
- VQAReTokenChunk:
max_seq_len: *max_seq_len
- TensorizeEntitiesRelations:
- Resize:
size: [224,224]
- NormalizeImage:
scale: 1
mean: [ 123.675, 116.28, 103.53 ]
std: [ 58.395, 57.12, 57.375 ]
order: 'hwc'
- ToCHWImage:
- KeepKeys:
keep_keys: [ 'input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'entities', 'relations'] # dataloader will return list in this order
loader:
shuffle: False
drop_last: False
batch_size_per_card: 32
num_workers: 0
`

GreatV · 2025-03-12T08:09:10Z

GreatV
Mar 12, 2025
Maintainer

re_vi_layoutxlm_xfund_zh.yml 和 ser_vi_layoutxlm_xfund_zh.yml 是用于 PaddleOCR 关键信息抽取（KIE）任务的配置文件。它们分别用于关系抽取（RE, Relation Extraction）和序列标注（SER, Sequence Labeling）任务，均基于 LayoutXLM 模型。以下是主要参数的详细说明：

1. 全局参数（Global）：

use_gpu: 是否使用 GPU 训练（True/False）。
epoch_num: 训练的总轮数（SER: 1000, RE: 200）。
log_smooth_window: 训练日志平滑窗口大小。
print_batch_step: 多少步打印一次日志信息。
save_model_dir: 训练模型的保存目录。
save_epoch_step: 每多少轮保存一次模型。
eval_batch_step: 评估的步长，例如 [0, 19] 表示从第 0 轮开始，每 19 轮进行评估一次。
cal_metric_during_train: 训练过程中是否计算指标（影响训练速度）。
save_inference_dir: 推理模型的保存目录。
use_visualdl: 是否使用 VisualDL 进行可视化监控。
seed: 设定随机种子以保证复现效果。
infer_img: 预测时使用的图片路径。
save_res_path: 预测结果的保存路径。
kie_rec_model_dir: KIE 任务中 OCR 识别模型的路径。
kie_det_model_dir: KIE 任务中 OCR 检测模型的路径。

2. 网络结构（Architecture）：

model_type: 任务类型，如 "kie"（关键信息抽取）。
algorithm: 使用的算法，例如 "LayoutXLM"。
Backbone:
- name: 预训练模型名称，如 LayoutXLMForSer 或 LayoutXLMForRe。
- pretrained: 是否加载预训练模型。
- mode: vi 使用 LayoutXLM-VI 版本。
- num_classes: 分类标签的数量（SER 任务）。

3. 损失函数（Loss）：

name: 损失函数名称。
- LossFromOutput: 直接计算模型输出结果的损失（RE）。
- VQASerTokenLayoutLMLoss: 用于序列标注任务（SER）。
key: 计算损失的关键变量。
reduction: 损失的归约方式，如 mean（平均）。

4. 优化器（Optimizer）：

name: 优化器类型，如 AdamW。
beta1, beta2: AdamW 的超参数，默认 0.9 和 0.999。
lr: 学习率设置，包括：
- learning_rate: 基础学习率。
- warmup_epoch: 学习率预热的轮数。
clip_norm: 梯度剪裁阈值（RE）。
regularizer: 权重衰减的正则化方法。

5. 后处理（PostProcess）：

name: 后处理方法，如：
- VQASerTokenLayoutLMPostProcess（SER）。
- VQAReTokenLayoutLMPostProcess（RE）。

6. 评估指标（Metric）：

name: 计算的指标，如：
- VQASerTokenMetric（SER）。
- VQAReTokenMetric（RE）。
main_indicator: 主要指标，如 hmean（F1 Score）。

7. 训练数据集（Train） & 验证数据集（Eval）：

dataset.name: 数据集名称（SimpleDataSet）。
data_dir: 训练/验证数据的存放路径。
label_file_list: 训练/验证集标注文件路径。
ratio_list: 多数据集训练时的比例。
transforms: 数据处理流水线，包括：
- DecodeImage: 读取图片。
- VQATokenLabelEncode: 标签编码：
  - contains_re: True（RE 任务）。
  - algorithm: *algorithm（继承设定的算法）。
  - class_path: 指定的分类文件路径（SER）。
  - use_textline_bbox_info: 是否使用文本行的边界信息。
  - order_method: 文本顺序模式（"tb-yx"）。
- VQATokenPad: 令牌填充处理。
- VQASerTokenChunk: 针对 SER 任务的 Token 切片处理。
- VQAReTokenRelation: 关系抽取专用（RE）。
- VQAReTokenChunk: 关系抽取的 Token 处理（RE）。
- TensorizeEntitiesRelations: 关系数据张量化（RE）。
- Resize: 如 [224, 224] 或 [448, 640]，调整图片尺寸。
- NormalizeImage: 标准化图像（均值-方差归一化）。
- ToCHWImage: 转换图像格式（HWC → CHW）。
- KeepKeys: 保留的关键数据字段，如：
  - [ 'input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'labels']（SER）。
  - [ 'input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'entities', 'relations']（RE）。

8. 数据加载器参数（loader）：

shuffle: 是否在训练时打乱数据顺序。
drop_last: 是否丢弃最后一个 batch（通常不丢弃 False）。
batch_size_per_card: 每张 GPU 处理的 batch 大小：
- SER: 8。
- RE: 32。
num_workers: 数据加载的进程数。

总结

ser_vi_layoutxlm_xfund_zh.yml 主要用于序列标注（Sequence Labeling, SER），将文本实体进行分类，一般用于命名实体识别（NER）。
re_vi_layoutxlm_xfund_zh.yml 主要用于关系抽取（Relation Extraction, RE），用于识别文本中实体之间的关联关系。
这两个配置文件整体结构相似，但 SER 任务在 Loss, PostProcess, Transforms 方面有所不同。
具体的参数含义可以参考 PaddleOCR 文档或 ppstructure/kie 相关代码。

如果需要更详细的官方说明，建议查阅 PaddleOCR KIE 相关文档。

Response generated by 🤖 feifei-bot | chatgpt-4o-latest

⚠️ This is an AI-generated response. Please verify any critical information.

1 reply

eunij-peanut Mar 12, 2025
Author

Resize: 如 [224, 224] 或 [448, 640]，调整图片尺寸。

请问和resize相关的代码在哪里呢？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

关键信息提取配置文件 #14844

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

关键信息提取配置文件 #14844

Uh oh!

eunij-peanut Mar 12, 2025

evaluation is run every 10 iterations after the 0th iteration

if you want to predict using the groundtruth ocr info,

you can use the following config

infer_img: /home/aistudio/work/PaddleOCR/train_data/zzsfp/imgs/b25.jpg

infer_mode: False

evaluation is run every 10 iterations after the 0th iteration

Replies: 1 comment · 1 reply

Uh oh!

GreatV Mar 12, 2025 Maintainer

1. 全局参数（Global）：

2. 网络结构（Architecture）：

3. 损失函数（Loss）：

4. 优化器（Optimizer）：

5. 后处理（PostProcess）：

6. 评估指标（Metric）：

7. 训练数据集（Train） & 验证数据集（Eval）：

8. 数据加载器参数（loader）：

总结

Uh oh!

eunij-peanut Mar 12, 2025 Author

eunij-peanut
Mar 12, 2025

Replies: 1 comment 1 reply

GreatV
Mar 12, 2025
Maintainer

eunij-peanut Mar 12, 2025
Author