latest/en/version3.x/pipeline_usage/PP-StructureV3 #16212
Replies: 17 comments 8 replies
-
|
here is my script: from paddleocr import PPStructureV3
input_file = "3ul2Rq4Sk5Cn-l69D695U.png"
pipeline = PPStructureV3(
# models
text_recognition_model_name="PP-OCRv5_mobile_rec",
text_detection_model_name="PP-OCRv5_mobile_det",
# table_classification_model_name="PP-LCNet_x1_0_table_cls",
# wired_table_cells_detection_model_name="SLANeXt_wired",
formula_recognition_model_name="PP-OCRv5_mobile_rec", # "PP-OCRv5_mobile_rec",
layout_detection_model_name="PP-DocBlockLayout", # "PP-DocBlockLayout",
region_detection_model_name="PP-DocBlockLayout",
# model dirs
text_recognition_model_dir='/mnt/public/public_file/models/PP-OCRv5_mobile_rec',
text_detection_model_dir='/mnt/public/public_file/models/PP-OCRv5_mobile_det',
# table_classification_model_dir='/mnt/public/public_file/models/PP-LCNet_x1_0_table_cls',
# wired_table_cells_detection_model_dir="/mnt/public/public_file/models/SLANeXt_wired",
formula_recognition_model_dir='/mnt/public/public_file/models/PP-OCRv5_mobile_rec',
layout_detection_model_dir='/mnt/public/public_file/models/PP-DocBlockLayout',
region_detection_model_dir='/mnt/public/public_file/models/PP-DocBlockLayout',
# Use options
use_doc_orientation_classify=False,
# # Use use_doc_orientation_classify to enable/disable document orientation classification model
use_doc_unwarping=False, # Use use_doc_unwarping to enable/disable document unwarping module
use_textline_orientation=False,
# # Use use_textline_orientation to enable/disable textline orientation classification model
use_seal_recognition=False,
use_table_recognition=False,
use_chart_recognition=False,
device="gpu", # Use device to specify GPU for model inference
)
output = pipeline.predict(input=input_file)
for res in output:
res.print() # Print the structured prediction output
res.save_to_json(save_path="output") ## Save the current image's structured result in JSON format
res.save_to_markdown(save_path="output")Why is my output Markdown file blank? |
Beta Was this translation helpful? Give feedback.
-
|
请问下面dict的中类别ID是在哪里定义的? layout_threshold 版面模型得分阈值。 |
Beta Was this translation helpful? Give feedback.
-
|
你好 ,我想问下 版面识别是否可以在识别的时候自动忽略印章?检验合同文档OCR的时候如何解决文字被印章覆盖的问题,导致OCR出来包含部分印章文字?同时印章的文字识别也被原本的合同文字遮挡导致失败?是否能进行两边的分离检查?万分感谢! |
Beta Was this translation helpful? Give feedback.
-
|
你好,有个问题咨询下,我想知道在使用PP-StructureV时如何调整表格识别下单元格识别的threshold参数 |
Beta Was this translation helpful? Give feedback.
-
|
公式乱码: |
Beta Was this translation helpful? Give feedback.
-
|
不对表格做提取,但如果表格中有公式,则表格图片中的公式就会被抠掉。咋整?? |
Beta Was this translation helpful? Give feedback.
-
|
Hi :) |
Beta Was this translation helpful? Give feedback.
-
|
can we quantize the model into less precision like int8 for this pipeline from paddleocr import PPStructureV3 |
Beta Was this translation helpful? Give feedback.
-
|
你好可以问下这个模型可以并发使用吗? |
Beta Was this translation helpful? Give feedback.
-
|
你好 ,我想问下 版面识别是否可以在识别的时候自动忽略印章?检验合同文档OCR的时候如何解决文字被印章覆盖的问题,导致OCR出来包含部分印章文字?同时印章的文字识别也被原本的合同文字遮挡导致失败?是否能进行两边的分离检查?万分感谢! |
Beta Was this translation helpful? Give feedback.
-
|
I'm a little lost. How can I use paddleocr PPStructureV3 completely locally from a specific directory with the models? from paddleocr import PPStructureV3 pipeline = PPStructureV3( ) "models" is my local folder for dowloaded models for PPStructureV3 And how can I disable that download validation that is performed every time the code is executed? It's annoying to have to wait several seconds for the files to be validated before I can even start the process. |
Beta Was this translation helpful? Give feedback.
-
|
im having this error My code: from paddleocr import PPStructureV3
pipeline = PPStructureV3(
lang='pt',
use_doc_orientation_classify=True,
use_doc_unwarping=True,
use_textline_orientation=True,
)Any Help? |
Beta Was this translation helpful? Give feedback.
-
|
All are good table detection text detection all are good but the html construction is not good why it happens? |
Beta Was this translation helpful? Give feedback.
-
|
I want to ask when Vietnamese text will be supported. |
Beta Was this translation helpful? Give feedback.
-
|
为什么没有一个office_model得参数设置,因为有七个模型,我要写7次地址,另外这个默认模型的下载地址也没有给出具体参数来配置路径,很麻烦,全部给我下载到c盘了 |
Beta Was this translation helpful? Give feedback.
-
|
我怎么感觉你们的代码老是这不兼容,那不兼容的,你们维护的代码简直像一坨翔?就不会安排人整理下吗? 直接按你们的要求安装各种库,都无法完美的跑你们的处理pdf的示例代码。。。。。虽然paddleocr我觉得很牛逼。 File "D:\VSstudioProjects\paddleocr_gpu.venv\Lib\site-packages\paddlex\inference\pipelines\layout_parsing\pipeline_v2.py", line 496, in standardized_data |
Beta Was this translation helpful? Give feedback.
-
|
如何设置 return_word_box参数未True,我在 GeneralOCR: |
Beta Was this translation helpful? Give feedback.


Uh oh!
There was an error while loading. Please reload this page.
-
latest/en/version3.x/pipeline_usage/PP-StructureV3
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
https://www.paddleocr.ai/latest/en/version3.x/pipeline_usage/PP-StructureV3.html
Beta Was this translation helpful? Give feedback.
All reactions