Replies: 1 comment
-
Structuring images within tables as they appear in the source PDFs is essential to avoid corruption in the information content. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Description:
Currently, Docling processes images on a page uniformly during PDF-to-Markdown conversion. However, images within tables are not preserved in their original positions during table recognition. This leads to discrepancies in the resulting Markdown document, especially when images are converted to URLs.
I propose enhancing Docling's table recognition functionality to embed images (or their URLs) within the corresponding table cells in the Markdown output. This will ensure that the structure of the original PDF table, including images, is faithfully reproduced.
Expected Behavior:
Images within tables in a PDF should be correctly identified as part of the table.
When converting PDF to Markdown:
If an image is in a table cell, its URL should appear within the corresponding table cell in the Markdown output.
If an image is outside a table, it should be handled as it is currently (positioned relative to its original location).
Benefits:
Improves the fidelity of PDF-to-Markdown conversion.
Ensures that image URLs maintain their original context, especially in complex documents with tables.
Beta Was this translation helpful? Give feedback.
All reactions