Docling not able to convert complex table into correct layout structure #2241

aruneshtobrakab · 2025-09-10T10:29:56Z

aruneshtobrakab
Sep 10, 2025

I have converted a document into a lossless json structure using docling (with the help of docling document object).
then from that docling object I tried the table structure that docling has understood for some of the table.
this pdf contains some complex table in page 6,8,10 which are not standard layout table.
The docling output fails to preserve the original table structure in several critical ways.

Original Table in Document:

AI-generated content may be incorrect.
Docling's JSON-to-Table Conversion:

AI-generated content may be incorrect.
The key issues with the conversion are:

Incorrect Header Identification: The first row of the table was not correctly identified as a data row instead of a column header.
Flawed Structural Assumption: The pipeline assumed a simple table structure, enforcing an equal number of columns for every row. This is incorrect, as the original table has rows with varying column spans (e.g., the 4th row has two major columns, while others have four).
Inaccurate Column Count: The conversion resulted in 8 columns, whereas the original visual layout clearly contains 6 primary columns.
T-14.150.130-SH1-O_short.pdf

@aruneshtobrakab · 2025-09-10T10:36:30Z

dosubot[bot]
bot Sep 10, 2025

Hi @aruneshtobrakab! I'm Dosu and I’m helping the docling team.

Docling's current table extraction has known limitations with complex tables—especially those with merged cells, multi-level headers, or rows with varying column counts. The extraction pipeline often enforces a rectangular grid and may misidentify headers or flatten nuanced layouts, which leads to the issues you described. This is a common pain point and has been reported by other users as well (example, example, example).

Recent updates in docling-core have introduced "rich tables" support (see version 1.6.0+), adding a RichTableCell class with properties like row_span, col_span, and positional offsets, which allows the internal model to represent merged cells and complex layouts more accurately (details). Serializers for HTML, Markdown, and Doctags have been updated to better preserve this structure during export, but CSV and DataFrame exports may still flatten or lose some of this information.

You can access the full table cell metadata (including spans and header flags) directly from the DoclingDocument JSON. For best fidelity, consider working with the JSON or HTML/Markdown exports, and reconstructing the table layout using the row_span and col_span fields. If you need to isolate specific pages or tables, the new filter method on DoclingDocument can help (see PR #378).

There are still no configuration flags to force value propagation for multi-column cells, and some edge cases require manual post-processing. If you want to experiment, try toggling between "accurate" and "fast" table extraction modes, or disabling cell matching (do_cell_matching=False)—sometimes this helps with merged cells, but results vary (reference).

If you need to preserve the most complex layouts, manual reconstruction from the JSON (using the full cell metadata) is currently the most reliable approach. The maintainers are aware of these limitations and are working on improvements, but perfect preservation of all complex table structures is not yet guaranteed (discussion).

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

2 replies

aruneshtobrakab Sep 10, 2025
Author

can utilising VLMs within docling help with this issue?

dosubot[bot] bot Sep 10, 2025

Yes, using Vision-Language Models (VLMs) within Docling can help with extracting complex tables, especially those with merged cells, multi-level headers, and irregular layouts. Docling supports transformer-based VLMs like SMOLDOCLING_VLLM, GRANITE_VISION_VLLM, and GOT2_TRANSFORMERS, which use deep learning to interpret table structure directly from images and often produce more accurate Markdown or HTML output than traditional OCR or rule-based methods. To leverage this, configure your pipeline to use a VLM backend (e.g., set vlm_model to SMOLDOCLING_VLLM), enable generate_page_images, and set force_backend_text to True for image-to-markdown conversion of complex tables.

However, the quality of extraction depends on the VLM model's capabilities and the prompt you provide. Some edge cases—like low-contrast scans or highly nonstandard tables—may still require manual post-processing or prompt tuning. Also, not all VLMs support advanced table enrichment options, and some configuration flags available in the standard pipeline (like table-mode=accurate) are not available in the VLM pipeline; the output is determined by the model and your prompt instructions. For best results, use a VLM server that implements the OpenAI API format (such as vllm or llama-server) and craft prompts that explicitly request semantic tables with correct row/col spans and structure. More details and code examples are available in the Docling discussions and issue threads.

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Docling not able to convert complex table into correct layout structure #2241

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Docling not able to convert complex table into correct layout structure #2241

Uh oh!

aruneshtobrakab Sep 10, 2025

Replies: 1 comment · 2 replies

Uh oh!

dosubot[bot] bot Sep 10, 2025

Uh oh!

aruneshtobrakab Sep 10, 2025 Author

Uh oh!

dosubot[bot] bot Sep 10, 2025

aruneshtobrakab
Sep 10, 2025

Replies: 1 comment 2 replies

dosubot[bot]
bot Sep 10, 2025

aruneshtobrakab Sep 10, 2025
Author