Skip to content

Section after table being incorrectly added as child of table header #2668

@akowalsk

Description

@akowalsk

Bug

For a reason I cannot determine, the attached file causes the parent/child relationships to be inappropriately mapped in the Docling Document object, causing the markdown (or any other) export to be incorrect. If you inspect the Document JSON, the table is correctly represented in the "Tables" array, but when you follow the refs to the second cell of the header (texts/3), it actually has a child, which is the header of the section following the table, and everything gets consolidated there on export.

{
      "self_ref": "#/texts/2",
      "parent": {
        "$ref": "#/groups/1"
      },
      "children": [],
      "content_layer": "body",
      "label": "section_header",
      "prov": [],
      "orig": "aaaaa",
      "text": "aaaaa",
      "level": 1
    },
    {
      "self_ref": "#/texts/3",
      "parent": {
        "$ref": "#/groups/2"
      },
      "children": [
        {
          "$ref": "#/texts/4"
        }
      ],
      "content_layer": "body",
      "label": "section_header",
      "prov": [],
      "orig": "aaaaaaaaa aaaaaa",
      "text": "aaaaaaaaa aaaaaa",
      "level": 1
    },
{
      "self_ref": "#/texts/4",
      "parent": {
        "$ref": "#/texts/3"
      },
      "children": [
        {
          "$ref": "#/groups/3"
        },
        {
          "$ref": "#/texts/8"
        },
        {
          "$ref": "#/tables/1"
        },
        {
          "$ref": "#/texts/11"
        }
      ],
      "content_layer": "body",
      "label": "section_header",
      "prov": [],
      "orig": "aaaaaaaa",
      "text": "aaaaaaaa",
      "level": 1
    },

...

Steps to reproduce

from docling.document_converter import DocumentConverter
filename='docling-table-bug.docx'
converter = DocumentConverter()
result = converter.convert(filename)
print(result.document.export_to_markdown()) 

Observe the generated markdown is

## aaaaaaaaaa

[aaaa aaaaaaaa aaa aaaaaaaa aaaaaaaaa aaaaaaaaaa aa aaaa aaaaaaaa aa aaaaaaaa aaaa aaaaaaaa. Aaa aaa aaaaaa aaaa aaaa aaa aaaaaa aaaaa aa aaaaaaaaaaa.  aa aaa aaaa aaaaaaaaa aaaa aaa aaa aaaaaaa aa aaaa aaaa.]

| ## aaaaa                                    |   ## aaaaaaaaa aaaaaa  ## aaaaaaaa  aaaaa aa *aaaaaaaaaa aaa aaaaaaaa aaaaaaaa* (aaa aaaaaaaaaa) aaa aaa aaaaa aaaaaa aaaaa aaa aaaaa aaaaaaaaaa aa aaaa aaaaaaa aa aaa aaaaa aaaaa.  [aaaaaaa aaaaaaaaaaa aaa aaaaaaaaaaaaa aaa aaaaa aaaa aaaaaa aaaa aaaa aaaaaaaaaaa aaa aaaaaaaa aaaaaaaa, aaaaaa aaa aaaaa aa aaa aaaaaa. aaaa aaaaaaaaaaa aaa aaaaaaaaaaaaa aa aaaaaaaaaaaa aaaaa.]  | ## aaaa [/](https://file+.vscode-resource.vscode-cdn.net/) Aaaaaaa   | ## aaaaaaaaaa               | |---------------------|-----------------------------| | aaa                 | aaaaaa aaa aaaaaaaaaaa aaaa | | aaa                 | aaaaaa aaaaa aaaaaaaa       | |
|---------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| aaaaaaaa aaaaaaaaaaaaa aaaaaaaaaa aa        |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            67547856 |
| aaaaaaaa aaaaaaaaaaaaa aa                   |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            68975669 |
| aaaaaaaa aaaaaaaaa aaa aaaaaaaaaa aaaaaaaaa |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           687956453 |
| aaaaaaaaa aaaaaaaaaaa aaaaaaaaa             |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        896756454687 |

it puts the entire last half of the document into the header cell of the table.

Docling version

docling --version
2025-11-20 15:50:56,527 - INFO - Loading plugin 'docling_defaults'
2025-11-20 15:50:56,529 - INFO - Registered ocr engines: ['auto', 'easyocr', 'ocrmac', 'rapidocr', 'tesserocr', 'tesseract']
Docling version: 2.62.0
Docling Core version: 2.51.1
Docling IBM Models version: 3.10.2
Docling Parse version: 4.7.0
Python: cpython-312 (3.12.9)
Platform: macOS-15.7.2-arm64-arm-64bit

Python version

python --version
Python 3.12.9

docling-table-bug.docx

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingdocxissue related to docx backendgood first issueIssues and pull requests for new contributors

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions