-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Open
Labels
Description
bug描述 Describe the Bug
严重bug,paddleocr PPStructureV3每个流水线占用虚拟内存8T,会耗尽虚拟内存导致无法系统异常
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2796503 root 20 0 8213.1g 2.5g 713280 S 107.3 0.3 0:31.23 python
2796496 root 20 0 8213.0g 2.3g 676160 S 107.0 0.3 0:28.59 python
2796499 root 20 0 8213.0g 2.4g 712128 S 107.0 0.3 0:29.92 python
2796506 root 20 0 8213.0g 2.3g 662080 S 106.6 0.3 0:30.32 python
🏃♂️ Environment (运行环境)
昇腾910B+arm环境
官方910B镜像
paddleocr版本3.20
流水线配置文件PP-StructureV3.yaml
SubModules:
ChartRecognition:
batch_size: 4
model_dir: /root/.paddlex/official_models/PP-Chart2Table
model_name: PP-Chart2Table
module_name: chart_recognition
LayoutDetection:
batch_size: 8
layout_merge_bboxes_mode:
0: large
1: large
2: union
3: union
4: union
5: union
6: union
7: large
8: union
9: union
10: union
11: union
12: union
13: union
14: union
15: union
16: large
17: union
18: union
19: union
layout_nms: true
layout_unclip_ratio:
- 1.0
- 1.0
model_dir: /root/.paddlex/official_models/PP-DocLayout_plus-L
model_name: PP-DocLayout_plus-L
module_name: layout_detection
threshold:
0: 0.3
1: 0.5
2: 0.4
3: 0.5
4: 0.5
5: 0.5
6: 0.5
7: 0.3
8: 0.5
9: 0.5
10: 0.5
11: 0.5
12: 0.5
13: 0.5
14: 0.5
15: 0.45
16: 0.5
17: 0.5
18: 0.5
19: 0.5
RegionDetection:
layout_merge_bboxes_mode: small
layout_nms: true
model_dir: /root/.paddlex/official_models/PP-DocBlockLayout
model_name: PP-DocBlockLayout
module_name: layout_detection
SubPipelines:
DocPreprocessor:
SubModules:
DocOrientationClassify:
batch_size: 8
model_dir: /root/.paddlex/official_models/PP-LCNet_x1_0_doc_ori
model_name: PP-LCNet_x1_0_doc_ori
module_name: doc_text_orientation
DocUnwarping:
model_dir: /root/.paddlex/official_models/UVDoc
model_name: UVDoc
module_name: image_unwarping
batch_size: 8
pipeline_name: doc_preprocessor
use_doc_orientation_classify: false
use_doc_unwarping: false
FormulaRecognition:
SubModules:
FormulaRecognition:
batch_size: 8
model_dir: /root/.paddlex/official_models/PP-FormulaNet_plus-L
model_name: PP-FormulaNet_plus-L
module_name: formula_recognition
batch_size: 8
pipeline_name: formula_recognition
use_doc_preprocessor: false
use_layout_detection: false
GeneralOCR:
SubModules:
TextDetection:
box_thresh: 0.6
limit_side_len: 736
limit_type: min
max_side_limit: 4000
model_dir: /root/.paddlex/official_models/PP-OCRv5_server_det
model_name: PP-OCRv5_server_det
module_name: text_detection
thresh: 0.3
unclip_ratio: 1.5
TextLineOrientation:
batch_size: 8
model_dir: /root/.paddlex/official_models/PP-LCNet_x1_0_textline_ori
model_name: PP-LCNet_x1_0_textline_ori
module_name: textline_orientation
TextRecognition:
batch_size: 8
model_dir: /root/.paddlex/official_models/PP-OCRv5_server_rec
model_name: PP-OCRv5_server_rec
module_name: text_recognition
score_thresh: 0.0
batch_size: 8
pipeline_name: OCR
text_type: general
use_doc_preprocessor: false
use_textline_orientation: false
SealRecognition:
SubPipelines:
SealOCR:
SubModules:
TextDetection:
box_thresh: 0.6
limit_side_len: 736
limit_type: min
max_side_limit: 4000
model_dir: /root/.paddlex/official_models/PP-OCRv4_server_seal_det
model_name: PP-OCRv4_server_seal_det
module_name: seal_text_detection
thresh: 0.2
unclip_ratio: 0.5
TextRecognition:
batch_size: 8
model_dir: /root/.paddlex/official_models/PP-OCRv5_server_rec
model_name: PP-OCRv5_server_rec
module_name: text_recognition
score_thresh: 0
batch_size: 8
pipeline_name: OCR
text_type: seal
use_doc_preprocessor: false
use_textline_orientation: false
batch_size: 8
pipeline_name: seal_recognition
use_doc_preprocessor: false
use_layout_detection: false
TableRecognition:
SubModules:
TableClassification:
model_dir: /root/.paddlex/official_models/PP-LCNet_x1_0_table_cls
model_name: PP-LCNet_x1_0_table_cls
module_name: table_classification
TableOrientationClassify:
model_dir: /root/.paddlex/official_models/PP-LCNet_x1_0_doc_ori
model_name: PP-LCNet_x1_0_doc_ori
module_name: doc_text_orientation
WiredTableCellsDetection:
model_dir: /root/.paddlex/official_models/RT-DETR-L_wired_table_cell_det
model_name: RT-DETR-L_wired_table_cell_det
module_name: table_cells_detection
WiredTableStructureRecognition:
model_dir: /root/.paddlex/official_models/SLANeXt_wired
model_name: SLANeXt_wired
module_name: table_structure_recognition
WirelessTableCellsDetection:
model_dir: /root/.paddlex/official_models/RT-DETR-L_wireless_table_cell_det
model_name: RT-DETR-L_wireless_table_cell_det
module_name: table_cells_detection
WirelessTableStructureRecognition:
model_dir: /root/.paddlex/official_models/SLANet_plus
model_name: SLANet_plus
module_name: table_structure_recognition
SubPipelines:
GeneralOCR:
SubModules:
TextDetection:
box_thresh: 0.4
limit_side_len: 736
limit_type: min
max_side_limit: 4000
model_dir: /root/.paddlex/official_models/PP-OCRv5_server_det
model_name: PP-OCRv5_server_det
module_name: text_detection
thresh: 0.3
unclip_ratio: 1.5
TextLineOrientation:
batch_size: 8
model_dir: /root/.paddlex/official_models/PP-LCNet_x1_0_textline_ori
model_name: PP-LCNet_x1_0_textline_ori
module_name: textline_orientation
TextRecognition:
batch_size: 8
model_dir: /root/.paddlex/official_models/PP-OCRv5_server_rec
model_name: PP-OCRv5_server_rec
module_name: text_recognition
pipeline_name: OCR
score_thresh: 0.0
text_type: general
use_doc_preprocessor: false
use_textline_orientation: false
pipeline_name: table_recognition_v2
use_doc_preprocessor: false
use_layout_detection: false
use_ocr_model: false
batch_size: 8
pipeline_name: PP-StructureV3
use_chart_recognition: false
use_doc_preprocessor: false
use_formula_recognition: true
use_region_detection: true
use_seal_recognition: false
use_table_recognition: true
🌰 Minimal Reproducible Example (最小可复现问题的Demo)
from pathlib import Path
from paddleocr import PPStructureV3
pipeline = PPStructureV3(
use_doc_orientation_classify=False,
use_doc_unwarping=False
)
# For Image
output = pipeline.predict(
input="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/pp_structure_v3_demo.png",
)
# Visualize the results and save the JSON results
for res in output:
res.print()
res.save_to_json(save_path="output")
res.save_to_markdown(save_path="output")其他补充信息 Additional Supplementary Information
已经在8.0.0和8.0.rc2两个版本的官方npu镜像上测试过,都会出现virt占用超8T的问题