PPStructureV3 Markdown output missing detected blocks (e.g., names, headers) #16119
Replies: 1 comment
-
Finished |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Bug Description
I'm using
PPStructureV3
from PaddleOCR to extract structured data from resume images and generate Markdown output using.save_to_markdown()
. The OCR detects all text blocks correctly (verified in JSON), but some of these blocks (like candidate name or contact info) are not included in the Markdown file.Impact
This issue results in:
Code
'''from pathlib import Path
from paddleocr import PPStructureV3
pipeline = PPStructureV3(
use_doc_orientation_classify=False,
use_doc_unwarping=False
)
output = pipeline.predict(
input="/home/dell/advance_ocr/data/resume4.jpg",
)
for res in output:
res.print()
res.save_to_json(save_path="output")
res.save_to_markdown(save_path="output") ''''
What I Expect from Markdown Output
I expect the Markdown output to contain all text blocks detected in the JSON output — even if they are not headers or structured sections — so that no content like names or project titles is lost.
Example
In the
parsing_res_list
JSON, I can see a block like:Beta Was this translation helpful? Give feedback.
All reactions