-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Bug Report
Description
The document processing pipeline fails during the document conversion step due to an unhandled HTTPError. The error occurs when calling the Azure OpenAI GPT-4o API (/chat/completions). The server responds with a 500 Internal Server Error.
Steps to Reproduce
- Trigger the document processing pipeline with a supported input document.
- The pipeline reaches the image tagging step using
api_image_request(). - An exception is raised from the underlying HTTP request to the Azure OpenAI endpoint.
Stack Trace
Traceback (most recent call last):
File "..\document_service.py", line 92, in _process_document_background
chunks = self.doc_processor.process(processing_file_path)
File "..\processor.py", line 38, in process
result = self.converter.convert(source_path)
File ".venv\Lib\site-packages\pydantic\_internal\_validate_call.py", line 39, in wrapper_function
return wrapper(*args, **kwargs)
File ".venv\Lib\site-packages\pydantic\_internal\_validate_call.py", line 136, in __call__
res = self.__pydantic_validator__.validate_python(pydantic_core.ArgsKwargs(args, kwargs))
File ".venv\Lib\site-packages\docling\document_converter.py", line 245, in convert
return next(all_res)
File ".venv\Lib\site-packages\docling\document_converter.py", line 268, in convert_all
for conv_res in conv_res_iter:
File ".venv\Lib\site-packages\docling\document_converter.py", line 340, in _convert
for item in map(
File ".venv\Lib\site-packages\docling\document_converter.py", line 387, in _process_document
conv_res = self._execute_pipeline(in_doc, raises_on_error=raises_on_error)
File ".venv\Lib\site-packages\docling\document_converter.py", line 410, in _execute_pipeline
conv_res = pipeline.execute(in_doc, raises_on_error=raises_on_error)
File ".venv\Lib\site-packages\docling\pipeline\base_pipeline.py", line 80, in execute
raise e
File ".venv\Lib\site-packages\docling\pipeline\base_pipeline.py", line 72, in execute
conv_res = self._build_document(conv_res)
File ".venv\Lib\site-packages\docling\pipeline\base_pipeline.py", line 270, in _build_document
raise e
File ".venv\Lib\site-packages\docling\pipeline\base_pipeline.py", line 230, in _build_document
for p in pipeline_pages: # Must exhaust!
File ".venv\Lib\site-packages\docling\pipeline\base_pipeline.py", line 195, in _apply_on_pages
yield from page_batch
File ".venv\Lib\site-packages\docling\models\api_vlm_model.py", line 101, in __call__
yield from executor.map(_vlm_request, page_batch)
File "\uv\python\cpython-3.12.11-windows-x86_64-none\Lib\concurrent\futures\_base.py", line 619, in result_iterator
yield _result_or_cancel(fs.pop())
File "\uv\python\cpython-3.12.11-windows-x86_64-none\Lib\concurrent\futures\_base.py", line 317, in _result_or_cancel
return fut.result(timeout)
File "\uv\python\cpython-3.12.11-windows-x86_64-none\Lib\concurrent\futures\_base.py", line 456, in result
return self.__get_result()
File "\uv\python\cpython-3.12.11-windows-x86_64-none\Lib\concurrent\futures\_base.py", line 401, in __get_result
raise self._exception
File "\uv\python\cpython-3.12.11-windows-x86_64-none\Lib\concurrent\futures\thread.py", line 59, in run
result = self.fn(*self.args, **self.kwargs)
File ".venv\Lib\site-packages\docling\models\api_vlm_model.py", line 87, in _vlm_request
page_tags = api_image_request(
File ".venv\Lib\site-packages\docling\utils\api_image_request.py", line 59, in api_image_request
r.raise_for_status()
File ".venv\Lib\site-packages\requests\models.py", line 1026, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError:
500 Server Error: Internal Server Error
for url: `https://<redacted-openai-endpoint>.azure.com/openai/deployments/gpt-4o/chat/completions?api-version=2024-12-01-preview`
Expected Behavior
The API should respond with a valid completion or return a controlled error (e.g., 4xx) that can be caught and handled gracefully in the pipeline.
Actual Behavior
The pipeline crashes with an unhandled HTTPError due to a 500 response from the Azure OpenAI deployment.
Environment
-
Docling version: v2.57.0
-
Python version: 3.12.11
-
Model used: gpt-4o
-
API version: 2024-12-01-preview
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working