Skip to content

fix: convert PIL images to RGB before picture description#3014

Open
aatrey56 wants to merge 2 commits intodocling-project:mainfrom
aatrey56:main
Open

fix: convert PIL images to RGB before picture description#3014
aatrey56 wants to merge 2 commits intodocling-project:mainfrom
aatrey56:main

Conversation

@aatrey56
Copy link

Documents frequently contain images in non-RGB modes — PNGs with transparency (RGBA), grayscale scans (L), or palette/indexed color (P). These were being passed directly to _annotate_images without any mode check. Transformers processors and VLM engines require 3-channel RGB input, so any non-RGB image would either crash the pipeline or produce incorrect output silently.

The fix is a single '.convert("RGB")' call in 'PictureDescriptionBaseModel.call', at the point where images are batched before being forwarded to '_annotate_images'. Placing it in the base class means all three subclasses benefit automatically:
'PictureDescriptionVlmModel' (transformers), 'PictureDescriptionVlmEngineModel' (engine abstraction), and 'PictureDescriptionApiModel'.

'Image.convert("RGB")' is safe to call unconditionally — if the image is already RGB it returns a copy unchanged.

Issue resolved by this Pull Request:
Resolves #3000

Checklist

  • Documentation has been updated, if necessary.
  • Examples have been added, if necessary.
  • Tests have been added, if necessary.

Non-RGB image modes (RGBA, L, P) cause failures or incorrect output
when passed to transformers processors or VLM engines, which expect
3-channel RGB input. Convert in the base model's __call__ so all
subclasses (transformers, engine, API) benefit from a single fix.

Closes docling-project#3000

Signed-off-by: aatrey56 <aatrey.sahay@gmail.com>
…rsion

fix: convert PIL images to RGB before picture description
@github-actions
Copy link
Contributor

DCO Check Passed

Thanks @aatrey56, all your commits are properly signed off. 🎉

@mergify
Copy link

mergify bot commented Feb 19, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

@dosubot
Copy link

dosubot bot commented Feb 19, 2026

Related Documentation

Checked 15 published document(s) in 1 knowledge base(s). No updates required.

How did I do? Any feedback?  Join Discord

@codecov
Copy link

codecov bot commented Feb 19, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

picture_description_vlm_model._annotate_images should convert PIL images to RGB before passing to transformers

1 participant

Comments