PPStructureV3 Markdown output missing detected blocks (e.g., names, headers) #16119

Kishoreatul · 2025-07-23T08:54:30Z

Kishoreatul
Jul 23, 2025

Bug Description

I'm using PPStructureV3 from PaddleOCR to extract structured data from resume images and generate Markdown output using .save_to_markdown(). The OCR detects all text blocks correctly (verified in JSON), but some of these blocks (like candidate name or contact info) are not included in the Markdown file.

Impact

This issue results in:

Certain Blocks Missing
Loss of essential information (like candidate identity or project sections)
Inability to trust the Markdown output for downstream NLP processing

Code

'''from pathlib import Path
from paddleocr import PPStructureV3

pipeline = PPStructureV3(
use_doc_orientation_classify=False,
use_doc_unwarping=False
)
output = pipeline.predict(
input="/home/dell/advance_ocr/data/resume4.jpg",
)
for res in output:
res.print()
res.save_to_json(save_path="output")
res.save_to_markdown(save_path="output") ''''

What I Expect from Markdown Output

I expect the Markdown output to contain all text blocks detected in the JSON output — even if they are not headers or structured sections — so that no content like names or project titles is lost.

Example

In the parsing_res_list JSON, I can see a block like:

{ "block_label": "header", "block_content": "Chandan Kumar"}

Kishoreatul · 2025-07-24T04:44:14Z

Kishoreatul
Jul 24, 2025
Author

Finished

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PPStructureV3 Markdown output missing detected blocks (e.g., names, headers) #16119

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

PPStructureV3 Markdown output missing detected blocks (e.g., names, headers) #16119

Uh oh!

Kishoreatul Jul 23, 2025

Bug Description

Impact

Code

What I Expect from Markdown Output

Example

Replies: 1 comment

Uh oh!

Kishoreatul Jul 24, 2025 Author

Kishoreatul
Jul 23, 2025

Kishoreatul
Jul 24, 2025
Author