Skip to content

Commit f6c3d14

Browse files
committed
fix: restore parent stack after processing rich table cells
In _handle_tables, when _walk_linear is called for rich table cells, it modifies self.parents, self.level, and self.level_at_new_list. These changes leak into subsequent document processing, causing sections after tables with formatted cells to be incorrectly nested. Save and restore the parser state (parents dict, level, and level_at_new_list) around the _walk_linear call for rich cells. Update ground truth for docx_rich_cells to reflect the corrected document structure. Resolves #2668 Signed-off-by: Br1an67 <932039080@qq.com>
1 parent 1eb5c21 commit f6c3d14

File tree

7 files changed

+525
-51
lines changed

7 files changed

+525
-51
lines changed

docling/backend/msword_backend.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1454,7 +1454,13 @@ def _handle_tables(
14541454
rich_table_cell: bool = self._is_rich_table_cell(cell)
14551455

14561456
if rich_table_cell:
1457+
saved_parents = dict(self.parents)
1458+
saved_level = self.level
1459+
saved_level_at_new_list = self.level_at_new_list
14571460
_, provs_in_cell = self._walk_linear(cell._element, doc)
1461+
self.parents = saved_parents
1462+
self.level = saved_level
1463+
self.level_at_new_list = saved_level_at_new_list
14581464
_log.debug(f"Table cell {row_idx},{col_idx} rich? {rich_table_cell}")
14591465

14601466
if len(provs_in_cell) > 0:
36 KB
Binary file not shown.

tests/data/groundtruth/docling_v2/docx_rich_cells.docx.itxt

Lines changed: 50 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -48,60 +48,60 @@ Second Paragraph
4848
item-45 at level 5: unspecified: group rich_cell_group_1_0_3
4949
item-46 at level 6: text: This is a paragraph
5050
item-47 at level 6: text: This is another paragraph
51-
item-48 at level 4: inline: group group
51+
item-48 at level 4: text:
5252
item-49 at level 4: text:
5353
item-50 at level 4: text:
5454
item-51 at level 4: text:
5555
item-52 at level 4: text:
5656
item-53 at level 4: text:
57-
item-54 at level 4: text:
58-
item-55 at level 3: section_header: Table with nested table
59-
item-56 at level 4: text: Before table
60-
item-57 at level 4: table with [3x2]
61-
item-58 at level 5: unspecified: group rich_cell_group_2_1_1
62-
item-59 at level 6: text: Simple cell with
63-
item-60 at level 6: text: bold
64-
item-61 at level 6: text: and
65-
item-62 at level 6: text: italic
66-
item-63 at level 6: text: text
67-
item-64 at level 5: unspecified: group rich_cell_group_3_0_2
68-
item-65 at level 6: table with [2x3]
69-
item-66 at level 7: unspecified: group rich_cell_group_3_0_1
70-
item-67 at level 8: text: Cell 1
71-
item-68 at level 7: unspecified: group rich_cell_group_3_1_1
72-
item-69 at level 8: text: Cell 2
73-
item-70 at level 7: unspecified: group rich_cell_group_3_2_1
74-
item-71 at level 8: text: Cell 3
75-
item-72 at level 6: text:
76-
item-73 at level 5: unspecified: group rich_cell_group_4_1_2
77-
item-74 at level 6: text: Rich cell
57+
item-54 at level 3: section_header: Table with nested table
58+
item-55 at level 4: text: Before table
59+
item-56 at level 4: table with [3x2]
60+
item-57 at level 5: unspecified: group rich_cell_group_2_1_1
61+
item-58 at level 6: text: Simple cell with
62+
item-59 at level 6: text: bold
63+
item-60 at level 6: text: and
64+
item-61 at level 6: text: italic
65+
item-62 at level 6: text: text
66+
item-63 at level 5: unspecified: group rich_cell_group_3_0_2
67+
item-64 at level 6: table with [2x3]
68+
item-65 at level 7: unspecified: group rich_cell_group_3_0_1
69+
item-66 at level 8: text: Cell 1
70+
item-67 at level 7: unspecified: group rich_cell_group_3_1_1
71+
item-68 at level 8: text: Cell 2
72+
item-69 at level 7: unspecified: group rich_cell_group_3_2_1
73+
item-70 at level 8: text: Cell 3
74+
item-71 at level 6: text:
75+
item-72 at level 5: unspecified: group rich_cell_group_4_1_2
76+
item-73 at level 6: text: Rich cell
7877
A nested table
79-
item-75 at level 6: table with [2x3]
80-
item-76 at level 7: unspecified: group rich_cell_group_4_0_1
81-
item-77 at level 8: text: Cell 1
82-
item-78 at level 7: unspecified: group rich_cell_group_4_1_1
83-
item-79 at level 8: text: Cell 2
84-
item-80 at level 7: unspecified: group rich_cell_group_4_2_1
85-
item-81 at level 8: text: Cell 3
86-
item-82 at level 6: text:
78+
item-74 at level 6: table with [2x3]
79+
item-75 at level 7: unspecified: group rich_cell_group_4_0_1
80+
item-76 at level 8: text: Cell 1
81+
item-77 at level 7: unspecified: group rich_cell_group_4_1_1
82+
item-78 at level 8: text: Cell 2
83+
item-79 at level 7: unspecified: group rich_cell_group_4_2_1
84+
item-80 at level 8: text: Cell 3
85+
item-81 at level 6: text:
86+
item-82 at level 4: inline: group group
8787
item-83 at level 4: inline: group group
88-
item-84 at level 4: inline: group group
89-
item-85 at level 5: text: After table with
90-
item-86 at level 5: text: bold
91-
item-87 at level 5: text: ,
92-
item-88 at level 5: text: underline
93-
item-89 at level 5: text: ,
94-
item-90 at level 5: text: strikethrough
95-
item-91 at level 5: text: , and
96-
item-92 at level 5: text: italic
97-
item-93 at level 5: text: formatting
98-
item-94 at level 4: text:
99-
item-95 at level 3: section_header: Table with pictures
100-
item-96 at level 4: text:
101-
item-97 at level 4: table with [3x2]
102-
item-98 at level 5: unspecified: group rich_cell_group_5_1_1
103-
item-99 at level 6: picture
104-
item-100 at level 5: unspecified: group rich_cell_group_5_0_2
105-
item-101 at level 6: text: Text and picture
106-
item-102 at level 6: picture
107-
item-103 at level 4: text:
88+
item-84 at level 5: text: After table with
89+
item-85 at level 5: text: bold
90+
item-86 at level 5: text: ,
91+
item-87 at level 5: text: underline
92+
item-88 at level 5: text: ,
93+
item-89 at level 5: text: strikethrough
94+
item-90 at level 5: text: , and
95+
item-91 at level 5: text: italic
96+
item-92 at level 5: text: formatting
97+
item-93 at level 4: text:
98+
item-94 at level 3: section_header: Table with pictures
99+
item-95 at level 4: text:
100+
item-96 at level 4: table with [3x2]
101+
item-97 at level 5: unspecified: group rich_cell_group_5_1_1
102+
item-98 at level 6: picture
103+
item-99 at level 5: unspecified: group rich_cell_group_5_0_2
104+
item-100 at level 6: text: Text and picture
105+
item-101 at level 6: picture
106+
item-102 at level 4: text:
107+
item-103 at level 1: inline: group group
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
item-0 at level 0: unspecified: group _root_
2+
item-1 at level 1: section: group header-0
3+
item-2 at level 2: section_header: First Section
4+
item-3 at level 3: text: Content before table.
5+
item-4 at level 3: table with [2x2]
6+
item-5 at level 4: unspecified: group rich_cell_group_1_0_0
7+
item-6 at level 5: text: Bold header text
8+
item-7 at level 5: text: Additional text in cell
9+
item-8 at level 4: unspecified: group rich_cell_group_1_1_0
10+
item-9 at level 5: text: Another bold header
11+
item-10 at level 5: section_header: Cell Sub-heading
12+
item-11 at level 2: section_header: Second Section
13+
item-12 at level 3: text: This text should be under Second Section.
14+
item-13 at level 3: section_header: Third Section
15+
item-14 at level 4: text: Under third section.

0 commit comments

Comments
 (0)