Docling not able to convert complex table into correct layout structure #2241
Replies: 1 comment 2 replies
-
Hi @aruneshtobrakab! I'm Dosu and I’m helping the docling team. Docling's current table extraction has known limitations with complex tables—especially those with merged cells, multi-level headers, or rows with varying column counts. The extraction pipeline often enforces a rectangular grid and may misidentify headers or flatten nuanced layouts, which leads to the issues you described. This is a common pain point and has been reported by other users as well (example, example, example). Recent updates in docling-core have introduced "rich tables" support (see version 1.6.0+), adding a You can access the full table cell metadata (including spans and header flags) directly from the DoclingDocument JSON. For best fidelity, consider working with the JSON or HTML/Markdown exports, and reconstructing the table layout using the There are still no configuration flags to force value propagation for multi-column cells, and some edge cases require manual post-processing. If you want to experiment, try toggling between "accurate" and "fast" table extraction modes, or disabling cell matching ( If you need to preserve the most complex layouts, manual reconstruction from the JSON (using the full cell metadata) is currently the most reliable approach. The maintainers are aware of these limitations and are working on improvements, but perfect preservation of all complex table structures is not yet guaranteed (discussion). To reply, just mention @dosu. How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I have converted a document into a lossless json structure using docling (with the help of docling document object).
then from that docling object I tried the table structure that docling has understood for some of the table.
this pdf contains some complex table in page 6,8,10 which are not standard layout table.
The docling output fails to preserve the original table structure in several critical ways.
Original Table in Document:
AI-generated content may be incorrect.
Docling's JSON-to-Table Conversion:
AI-generated content may be incorrect.
The key issues with the conversion are:
Incorrect Header Identification: The first row of the table was not correctly identified as a data row instead of a column header.
Flawed Structural Assumption: The pipeline assumed a simple table structure, enforcing an equal number of columns for every row. This is incorrect, as the original table has rows with varying column spans (e.g., the 4th row has two major columns, while others have four).
Inaccurate Column Count: The conversion resulted in 8 columns, whereas the original visual layout clearly contains 6 primary columns.
T-14.150.130-SH1-O_short.pdf
Beta Was this translation helpful? Give feedback.
All reactions