Enhance Docling to Preserve Image Position in Tables when Converting PDF to Markdown #780

TigarHe · 2025-01-21T07:41:39Z

TigarHe
Jan 21, 2025

Description:

Currently, Docling processes images on a page uniformly during PDF-to-Markdown conversion. However, images within tables are not preserved in their original positions during table recognition. This leads to discrepancies in the resulting Markdown document, especially when images are converted to URLs.
I propose enhancing Docling's table recognition functionality to embed images (or their URLs) within the corresponding table cells in the Markdown output. This will ensure that the structure of the original PDF table, including images, is faithfully reproduced.

Expected Behavior:
Images within tables in a PDF should be correctly identified as part of the table.
When converting PDF to Markdown:
If an image is in a table cell, its URL should appear within the corresponding table cell in the Markdown output.
If an image is outside a table, it should be handled as it is currently (positioned relative to its original location).

Benefits:
Improves the fidelity of PDF-to-Markdown conversion.
Ensures that image URLs maintain their original context, especially in complex documents with tables.

ChandanKSahu · 2025-06-02T14:43:01Z

ChandanKSahu
Jun 2, 2025

Structuring images within tables as they appear in the source PDFs is essential to avoid corruption in the information content.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enhance Docling to Preserve Image Position in Tables when Converting PDF to Markdown #780

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Enhance Docling to Preserve Image Position in Tables when Converting PDF to Markdown #780

Uh oh!

TigarHe Jan 21, 2025

Replies: 1 comment

Uh oh!

ChandanKSahu Jun 2, 2025

TigarHe
Jan 21, 2025

ChandanKSahu
Jun 2, 2025