Skip to content

fix: handle external image references in docx files#3148

Closed
joaquinhuigomez wants to merge 2 commits intodocling-project:mainfrom
joaquinhuigomez:fix/docx-external-image-crash
Closed

fix: handle external image references in docx files#3148
joaquinhuigomez wants to merge 2 commits intodocling-project:mainfrom
joaquinhuigomez:fix/docx-external-image-crash

Conversation

@joaquinhuigomez
Copy link
Contributor

@joaquinhuigomez joaquinhuigomez commented Mar 17, 2026

No description provided.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 17, 2026

DCO Check Passed

Thanks @joaquinhuigomez, all your commits are properly signed off. 🎉

@joaquinhuigomez joaquinhuigomez changed the title fix: handle PermissionError for directory input on Windows CLI fix: handle external image references in docx files Mar 17, 2026
@mergify
Copy link

mergify bot commented Mar 17, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

joaquinhui and others added 2 commits March 18, 2026 11:29
When a .docx file contains image relationships with TargetMode="External"
(e.g. URLs pointing to external images), accessing rel.target_part raises
an error because external relationships don't have a target_part. This
adds a guard to check rel.is_external before accessing target_part,
returning None so the image is gracefully skipped.

Fixes docling-project#3113

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Joaquin Hui Gomez <joaquinhui1995@gmail.com>
Signed-off-by: Joaquin Hui Gomez <joaquinhui1995@gmail.com>
@joaquinhuigomez joaquinhuigomez force-pushed the fix/docx-external-image-crash branch from 657cfa0 to aa2ca5a Compare March 18, 2026 11:29
@PeterStaar-IBM PeterStaar-IBM requested review from PeterStaar-IBM, ceberam and dolfim-ibm and removed request for ceberam and dolfim-ibm March 18, 2026 15:43
)
if rId in self.docx_obj.part.rels:
rel = self.docx_obj.part.rels[rId]
# External relationships (e.g. URLs) don't have a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we support URI in the image part, maybe we should propagate that?

@codecov
Copy link

codecov bot commented Mar 18, 2026

Codecov Report

❌ Patch coverage is 75.00000% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
docling/backend/msword_backend.py 75.00% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@joaquinhuigomez
Copy link
Contributor Author

Good point — I'll look into propagating the URI through to the image part instead of just downloading and embedding it. Let me update the PR.

@ceberam
Copy link
Member

ceberam commented Mar 18, 2026

Thanks @joaquinhuigomez for your interest in Docling and your contributions!
The issue #3113 was already being addressed through PR #3114 and it will be merged as soon as the change requests from the reviews are completed.

However, as @PeterStaar-IBM pointed out, we could (optionally) fetch external images like we do in the HTMLDocumentBackend, instead of ignoring them. If you would be interested to contribute on this, I would suggest that you:

  • Close this PR.
  • Open a new issue of type Feature request and fill up the form, describing the feature.
  • Add a comment on the new issue expressing the willingness to implement the feature. We will add you as assignee.
  • Once you have the implementation ready, open a PR and link it to the issue, making sure it passes all the style checks.

Thanks again for your contributions!

@joaquinhuigomez
Copy link
Contributor Author

Closing this per maintainer guidance. The requested next step is to open a feature request for optionally fetching external images in DOCX (similar to HTMLDocumentBackend), then link a future implementation PR to that issue. I’ll follow that route instead of keeping this PR open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants