fix(docx): preserve parent state when processing table cells#3076
Closed
Br1an67 wants to merge 1 commit intodocling-project:mainfrom
Closed
fix(docx): preserve parent state when processing table cells#3076Br1an67 wants to merge 1 commit intodocling-project:mainfrom
Br1an67 wants to merge 1 commit intodocling-project:mainfrom
Conversation
When processing rich table cells and 1x1 tables, _walk_linear modifies self.parents (especially if cells contain headings). This corrupted the parent hierarchy, causing section headers after tables to be incorrectly added as children of table header cells. Save and restore self.parents around _walk_linear calls in _handle_tables to prevent parent state pollution, following the same pattern used in _handle_textbox_content.
Contributor
|
❌ DCO Check Failed Hi @Br1an67, your pull request has failed the Developer Certificate of Origin (DCO) check. This repository supports remediation commits, so you can fix this without rewriting history — but you must follow the required message format. 🛠 Quick Fix: Add a remediation commitRun this command: git commit --allow-empty -s -m "DCO Remediation Commit for Br1an67 <932039080@qq.com>
I, Br1an67 <932039080@qq.com>, hereby add my Signed-off-by to this commit: 14bff6bdf19692003f85899bf42d07e15caf5031"
git push🔧 Advanced: Sign off each commit directlyFor the latest commit: git commit --amend --signoff
git push --force-with-leaseFor multiple commits: git rebase --signoff origin/main
git push --force-with-leaseMore info: DCO check report |
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Contributor
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue resolved by this Pull Request:
Resolves #2668
Summary
Fixed a bug where section headers appearing after tables in DOCX documents were incorrectly being added as children of table header cells. This caused the document hierarchy to be corrupted, resulting in malformed markdown and other exports.
Root Cause
The
_handle_tablesmethod inmsword_backend.pycalls_walk_linearto process rich table cell content. However,_walk_linearmodifiesself.parents(especially when cells contain headings). This parent state was never restored after processing each cell, polluting the parent hierarchy for subsequent document elements.Changes
self.parents.copy()before calling_walk_linearin two places within_handle_tables:self.parentsafter processing each cellThis follows the same pattern already used in
_handle_textbox_content.Testing
Checklist: