fix(cli): avoid generating images for non-image exports#3127
fix(cli): avoid generating images for non-image exports#3127dolfim-ibm merged 4 commits intodocling-project:mainfrom
Conversation
Signed-off-by: Hassan Raza <raihassanraza10@gmail.com>
|
✅ DCO Check Passed Thanks @M-Hassan-Raza, all your commits are properly signed off. 🎉 |
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
Signed-off-by: Hassan Raza <raihassanraza10@gmail.com>
|
Related Documentation 1 document(s) may need updating based on files changed in this PR: Docling How can I improve the resolution or quality of images extracted from a PDF using docling?View Suggested Changes@@ -9,7 +9,7 @@
)
```
-If you use the CLI, docling sets `images_scale=2` by default when exporting images, but there is no direct CLI flag to set it higher; for more control, you need to customize the pipeline in code ([example](https://github.com/docling-project/docling/blob/aab3ff5d82fc54864657c0c2ff8e0aa21461f23f/docling/cli/main.py#L379-L965)).
+When using the CLI with image-capable export formats (JSON, YAML, HTML, Markdown), docling sets `images_scale=2` by default when image export mode is not set to placeholder. For text-only export formats (text, doctags, vtt), images are not generated regardless of the image export mode setting. There is no direct CLI flag to set `images_scale` higher than 2; for more control, you need to customize the pipeline in code ([example](https://github.com/docling-project/docling/blob/aab3ff5d82fc54864657c0c2ff8e0aa21461f23f/docling/cli/main.py#L379-L965)).
**Note:** There are known issues in some recent versions where setting `images_scale` above 1.0 can cause bugs with image cropping or bounding box scaling, resulting in incorrectly framed images ([details](https://github.com/docling-project/docling/issues/2416)). If you encounter this, a temporary workaround is to extract at `scale=1.0` and manually upscale images, though this will not improve actual detail or framing.
Note: You must be authenticated to accept/decline updates. |
|
@M-Hassan-Raza I am not really sure this is strctly necessary. @cau-git what do you think? |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Hey! Yeah this isn't fixing broken output, everything works as-is. It's more of an optimization since when exporting to text-only formats like text, doctags, or vtt, the CLI still generates page and picture images that never get referenced in the output. For larger PDFs that means rendering every page at 2x scale, cropping picture elements, and keeping all those PIL images in memory for nothing. This just skips that work when the output format won't use it. Totally open to keeping things as they are if there's a reason to always generate them though! Again i haven't traversed the entire codebase so my analysis and understanding might be a bit off 😁 |
|
I'm also not sure this is needed, and some of the formats in the disable-list could actually work with images. |
|
Thanks for taking a closer look here, that’s fair. I went back through the current CLI export paths again to sanity check the assumption. From what I can see, JSON/YAML/HTML/Markdown are the outputs that actually consume image_mode; text explicitly forces placeholder, and doctags / vtt don’t take image_mode in the CLI export path at all. So the goal here was only to skip the extra image generation work for outputs that currently can’t reference those images anyway. That said, if this feels too marginal or just not worth the added policy in the CLI, I’m completely fine dropping it rather than pushing an optimization that isn’t considered valuable enough. |
|
@M-Hassan-Raza I think your PR is addressing a valid point, it could save resources to not generate page images / element images at the output scale if not desired from the CLI args. The implementation looks correct, since it is only addressing output from the Here are my two cents:
|
Signed-off-by: Hassan Raza <raihassanraza10@gmail.com>
|
@cau-git addressed your review feedback on this branch:
Could you take another look? |
Signed-off-by: Hassan Raza <raihassanraza10@gmail.com>
CLI currently turns on page and picture image generation for any non-placeholder image export mode, even for outputs like
text,doctags, andvttthat never use images.This keeps the existing behavior for image-capable exports like markdown, HTML, JSON, and YAML, but skips the extra image work for text-only exports. I also added CLI coverage around both paths.