Replies: 2 comments 2 replies
-
Docling expects the external API's response for image descriptions to closely match the OpenAI API format, specifically returning a top-level field with the image description text (usually called "description"). If the AOAI response is nested, uses different field names, or omits this field, Docling may silently fail to produce PictureDescriptionData—even if the API call itself succeeds. This is a common cause of the issue you're seeing, and there are no explicit errors in these cases—enabling debug logging may help diagnose it further. See this discussion for details. The core logic for parsing the API response is in the plugin files (e.g., There are no direct examples or test cases for AOAI or OpenAI Vision integration in the public docling-core repo, but the integration pattern is the same: ensure your API returns a JSON object with a top-level Checklist:
References: To reply, just mention @dosu. How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other |
Beta Was this translation helpful? Give feedback.
-
The setup looks correct. How are you checking if the description is populated or not? Are you looking at the content of the annotations in the DoclingDocument, doing an export to markdown or others? There is indeed the parameter |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all, I'm currently exploring how to use Azure OpenAI (AOAI) API to generate image descriptions for document processing in Docling. I aim to integrate this into a VLM pipeline that generates Markdown summaries with image annotations. However, I’ve encountered a problem: the image descriptions are not being generated, despite the API being called with the proper config. I'm unsure if I'm missing something in how Docling expects the VLM or AOAI response format.
What I'm doing
I'm using
PictureDescriptionApiOptions
with AOAI like this:And I register this with the PDF pipeline options (same for images):
My questions
Is Docling expecting a specific response format or content structure from AOAI when describing images?
Are there examples or test cases for using external LLM services (like AOAI or OpenAI Vision) for image description?
Could this be a limitation of AOAI (not supporting image input yet in this pipeline), or is there a specific integration step I’m missing?
Notes
The deployment model is gpt-4.1-mini on Azure OpenAI.
I’ve confirmed that PictureDescriptionApiOptions is constructed and passed properly.
No errors are thrown, but no PictureDescriptionData is returned.
Any guidance or suggestions from the Docling community would be greatly appreciated!
Beta Was this translation helpful? Give feedback.
All reactions