Skip to content

ML OCR memory leaks? #23462

@m41denx

Description

@m41denx

I have searched the existing issues, both open and closed, to make sure this is not a duplicate report.

  • Yes

The bug

I have over 30k photos and videos on my immich server, which has no gpu. So after I upload a bunch of fresh photos, I run immich-ml on my pc that has RTX 4070 Super via docker desktop and it crunches through face detection and smart search stuff in mere minutes. With OCR it runs like a turtle, so I left it running overnight.

When I came back 5 hours later, container ate all my resources:

  • CPU: 99.97% (AMD Ryzen 5600X 6c/12t)
  • RAM (container): 12.32GB / 15.58GB
  • GPU VRAM: 11.7/12GB Dedicated, 15.5/16GB Shared, 27.2/28GB Total memory (ReBAR?)
    You can see from logs that it all happened in just one hour

This behavior is not isolated to Docker desktop, WSL or CUDA. When immich switched to its internal ml server, it maxed out 48c/96t Xeon CPU and load has not dropped even after I cancelled OCR job and cleaned the queue. The only fix was to restart ml container.

Used model: PP-OCRv5_server (I will test mobile version and report its performance later UPD: PP-OCRv5_mobile doesn't have such issue)

Not related, but for some reason I couldn't run OCR in parallel (concurrency > 1), ml worker errors with

[E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running FusedConv node. Name:'Conv.0' Status Message: CUDA error cudaErrorStreamCaptureUnsupported:operation not permitted when stream is capturing

The OS that Immich Server is running on

Debian 13 (via docker compose)

Version of Immich Server

v2.2.0

Version of Immich Mobile App

irrelevant

Platform with the issue

  • Server
  • Web
  • Mobile

Device make and model

No response

Your docker-compose.yml content

# ML worker compose content
name: immich

services:
  immich-machine-learning:
    container_name: immich_machine_learning
    # For hardware acceleration, add one of -[armnn, cuda, rocm, openvino, rknn] to the image tag.
    # Example tag: ${IMMICH_VERSION:-release}-cuda
    image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}-cuda
    extends: # uncomment this section for hardware acceleration - see https://docs.immich.app/features/ml-hardware-acceleration
      file: hwaccel.ml.yml
      service: cuda # set to one of [armnn, cuda, rocm, openvino, openvino-wsl, rknn] for accelerated inference - use the `-wsl` version for WSL2 where applicable
    volumes:
      - model-cache:/cache
    environment:
      - IMMICH_VERSION=v2
    restart: always
    healthcheck:
      disable: false
    ports:
      - 3003:3003
volumes:
  model-cache:

Your .env content

IMMICH_VERSION=v2

Reproduction steps

described above

Relevant log output

immich_machine_learning  | [11/01/25 00:49:08] INFO     Setting execution providers to
immich_machine_learning  |                              ['CUDAExecutionProvider', 'CPUExecutionProvider'],
immich_machine_learning  |                              in descending order of preference
immich_machine_learning  | [11/01/25 00:49:08] INFO     Using engine_name: onnxruntime
immich_machine_learning  | 2025-11-01 01:39:54.138027723 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running Concat node. Name:'Concat.16' Status Message: /onnxruntime_src/onnxruntime/core/framework/bfc_arena.cc:376 void* onnxruntime::BFCArena::AllocateRawInternal(size_t, bool, onnxruntime::Stream*, bool, onnxruntime::WaitNotificationFn) Failed to allocate memory for requested buffer of size 3706060800
immich_machine_learning  |
immich_machine_learning  | [11/01/25 01:39:54] ERROR    Exception in ASGI application
immich_machine_learning  |
immich_machine_learning  |                              ╭─────── Traceback (most recent call last) ───────╮
immich_machine_learning  |                              │ /opt/venv/lib/python3.11/site-packages/rapidocr │
immich_machine_learning  |                              │ /inference_engine/onnxruntime/main.py:90 in     │
immich_machine_learning  |                              │ __call__                                        │
immich_machine_learning  |                              │                                                 │
immich_machine_learning  |                              │    87 │   def __call__(self, input_content: np. │
immich_machine_learning  |                              │    88 │   │   input_dict = dict(zip(self.get_in │
immich_machine_learning  |                              │    89 │   │   try:                              │
immich_machine_learning  |                              │ ❱  90 │   │   │   return self.session.run(self. │
immich_machine_learning  |                              │    91 │   │   except Exception as e:            │
immich_machine_learning  |                              │    92 │   │   │   error_info = traceback.format │
immich_machine_learning  |                              │    93 │   │   │   raise ONNXRuntimeError(error_ │
immich_machine_learning  |                              │                                                 │
immich_machine_learning  |                              │ /opt/venv/lib/python3.11/site-packages/onnxrunt │
immich_machine_learning  |                              │ ime/capi/onnxruntime_inference_collection.py:22 │
immich_machine_learning  |                              │ 0 in run                                        │
immich_machine_learning  |                              │                                                 │
immich_machine_learning  |                              │    217 │   │   if not output_names:             │
immich_machine_learning  |                              │    218 │   │   │   output_names = [output.name  │
immich_machine_learning  |                              │    219 │   │   try:                             │
immich_machine_learning  |                              │ ❱  220 │   │   │   return self._sess.run(output │
immich_machine_learning  |                              │    221 │   │   except C.EPFail as err:          │
immich_machine_learning  |                              │    222 │   │   │   if self._enable_fallback:    │
immich_machine_learning  |                              │    223 │   │   │   │   print(f"EP Error: {err!s │
immich_machine_learning  |                              ╰─────────────────────────────────────────────────╯
immich_machine_learning  |                              RuntimeException: [ONNXRuntimeError] : 6 :
immich_machine_learning  |                              RUNTIME_EXCEPTION : Non-zero status code returned
immich_machine_learning  |                              while running Concat node. Name:'Concat.16' Status
immich_machine_learning  |                              Message:
immich_machine_learning  |                              /onnxruntime_src/onnxruntime/core/framework/bfc_are
immich_machine_learning  |                              na.cc:376 void*
immich_machine_learning  |                              onnxruntime::BFCArena::AllocateRawInternal(size_t,
immich_machine_learning  |                              bool, onnxruntime::Stream*, bool,
immich_machine_learning  |                              onnxruntime::WaitNotificationFn) Failed to allocate
immich_machine_learning  |                              memory for requested buffer of size 3706060800
immich_machine_learning  |
immich_machine_learning  |
immich_machine_learning  |                              The above exception was the direct cause of the
immich_machine_learning  |                              following exception:
immich_machine_learning  |
immich_machine_learning  |                              ╭─────── Traceback (most recent call last) ───────╮
immich_machine_learning  |                              │ /usr/src/immich_ml/main.py:177 in predict       │
immich_machine_learning  |                              │                                                 │
immich_machine_learning  |                              │   174 │   │   inputs = text                     │
immich_machine_learning  |                              │   175 │   else:                                 │
immich_machine_learning  |                              │   176 │   │   raise HTTPException(400, "Either  │
immich_machine_learning  |                              │ ❱ 177 │   response = await run_inference(inputs │
immich_machine_learning  |                              │   178 │   return ORJSONResponse(response)       │
immich_machine_learning  |                              │   179                                           │
immich_machine_learning  |                              │   180                                           │
immich_machine_learning  |                              │                                                 │
immich_machine_learning  |                              │ /usr/src/immich_ml/main.py:202 in run_inference │
immich_machine_learning  |                              │                                                 │
immich_machine_learning  |                              │   199 │   │   response[entry["task"]] = output  │
immich_machine_learning  |                              │   200 │                                         │
immich_machine_learning  |                              │   201 │   without_deps, with_deps = entries     │
immich_machine_learning  |                              │ ❱ 202 │   await asyncio.gather(*[_run_inference │
immich_machine_learning  |                              │   203 │   if with_deps:                         │
immich_machine_learning  |                              │   204 │   │   await asyncio.gather(*[_run_infer │
immich_machine_learning  |                              │   205 │   if isinstance(payload, Image):        │
immich_machine_learning  |                              │                                                 │
immich_machine_learning  |                              │ /usr/src/immich_ml/main.py:197 in               │
immich_machine_learning  |                              │ _run_inference                                  │
immich_machine_learning  |                              │                                                 │
immich_machine_learning  |                              │   194 │   │   │   │   message = f"Task {entry[' │
immich_machine_learning  |                              │       output of {dep}"                          │
immich_machine_learning  |                              │   195 │   │   │   │   raise HTTPException(400,  │
immich_machine_learning  |                              │   196 │   │   model = await load(model)         │
immich_machine_learning  |                              │ ❱ 197 │   │   output = await run(model.predict, │
immich_machine_learning  |                              │   198 │   │   outputs[model.identity] = output  │
immich_machine_learning  |                              │   199 │   │   response[entry["task"]] = output  │
immich_machine_learning  |                              │   200                                           │
immich_machine_learning  |                              │                                                 │
immich_machine_learning  |                              │ /usr/src/immich_ml/main.py:215 in run           │
immich_machine_learning  |                              │                                                 │
immich_machine_learning  |                              │   212 │   if thread_pool is None:               │
immich_machine_learning  |                              │   213 │   │   return func(*args, **kwargs)      │
immich_machine_learning  |                              │   214 │   partial_func = partial(func, *args, * │
immich_machine_learning  |                              │ ❱ 215 │   return await asyncio.get_running_loop │
immich_machine_learning  |                              │   216                                           │
immich_machine_learning  |                              │   217                                           │
immich_machine_learning  |                              │   218 async def load(model: InferenceModel) ->  │
immich_machine_learning  |                              │                                                 │
immich_machine_learning  |                              │ /usr/local/lib/python3.11/concurrent/futures/th │
immich_machine_learning  |                              │ read.py:58 in run                               │
immich_machine_learning  |                              │                                                 │
immich_machine_learning  |                              │ /usr/src/immich_ml/models/base.py:60 in predict │
immich_machine_learning  |                              │                                                 │
immich_machine_learning  |                              │    57 │   │   self.load()                       │
immich_machine_learning  |                              │    58 │   │   if model_kwargs:                  │
immich_machine_learning  |                              │    59 │   │   │   self.configure(**model_kwargs │
immich_machine_learning  |                              │ ❱  60 │   │   return self._predict(*inputs)     │
immich_machine_learning  |                              │    61 │                                         │
immich_machine_learning  |                              │    62 │   @abstractmethod                       │
immich_machine_learning  |                              │    63 │   def _predict(self, *inputs: Any, **mo │
immich_machine_learning  |                              │                                                 │
immich_machine_learning  |                              │ /usr/src/immich_ml/models/ocr/detection.py:68   │
immich_machine_learning  |in _predict                                     │
immich_machine_learning  |                              │                                                 │
immich_machine_learning  |                              │   65 │   │   return session                     │
immich_machine_learning  |                              │   66 │                                          │
immich_machine_learning  |                              │   67 │   def _predict(self, inputs: bytes | Ima │
immich_machine_learning  |                              │ ❱ 68 │   │   results = self.model(decode_cv2(in │
immich_machine_learning  |                              │   69 │   │   if results.boxes is None or result │
immich_machine_learning  |                              │   70 │   │   │   return self._empty             │
immich_machine_learning  |                              │   71 │   │   return {                           │
immich_machine_learning  |                              │                                                 │
immich_machine_learning  |                              │ /opt/venv/lib/python3.11/site-packages/rapidocr │
immich_machine_learning  |                              │ /ch_ppocr_det/main.py:59 in __call__            │
immich_machine_learning  |                              │                                                 │
immich_machine_learning  |                              │    56 │   │   if prepro_img is None:            │
immich_machine_learning  |                              │    57 │   │   │   return TextDetOutput()        │
immich_machine_learning  |                              │    58 │   │                                     │
immich_machine_learning  |                              │ ❱  59 │   │   preds = self.session(prepro_img)  │
immich_machine_learning  |                              │    60 │   │   boxes, scores = self.postprocess_ │
immich_machine_learning  |                              │    61 │   │   if len(boxes) < 1:                │
immich_machine_learning  |                              │    62 │   │   │   return TextDetOutput()        │
immich_machine_learning  |                              │                                                 │
immich_machine_learning  |                              │ /opt/venv/lib/python3.11/site-packages/rapidocr │
immich_machine_learning  |                              │ /inference_engine/onnxruntime/main.py:93 in     │
immich_machine_learning  |                              │ __call__                                        │
immich_machine_learning  |                              │                                                 │
immich_machine_learning  |                              │    90 │   │   │   return self.session.run(self. │
immich_machine_learning  |                              │    91 │   │   except Exception as e:            │
immich_machine_learning  |                              │    92 │   │   │   error_info = traceback.format │
immich_machine_learning  |                              │ ❱  93 │   │   │   raise ONNXRuntimeError(error_ │
immich_machine_learning  |                              │    94 │                                         │
immich_machine_learning  |                              │    95 │   def get_input_names(self) -> List[str │
immich_machine_learning  |                              │    96 │   │   return [v.name for v in self.sess │
immich_machine_learning  |                              ╰─────────────────────────────────────────────────╯
immich_machine_learning  |                              ONNXRuntimeError: Traceback (most recent call
immich_machine_learning  |                              last):
immich_machine_learning  |                                File
immich_machine_learning  |                              "/opt/venv/lib/python3.11/site-packages/rapidocr/in
immich_machine_learning  |                              ference_engine/onnxruntime/main.py", line 90, in
immich_machine_learning  |                              __call__
immich_machine_learning  |                                  return
immich_machine_learning  |                              self.session.run(self.get_output_names(),
immich_machine_learning  |                              input_dict)[0]
immich_machine_learning  |                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
immich_machine_learning  |                              ^^^^^^^^^^^^^
immich_machine_learning  |                                File
immich_machine_learning  |                              "/opt/venv/lib/python3.11/site-packages/onnxruntime
immich_machine_learning  |                              /capi/onnxruntime_inference_collection.py", line
immich_machine_learning  |                              220, in run
immich_machine_learning  |                                  return self._sess.run(output_names, input_feed,
immich_machine_learning  |                              run_options)
immich_machine_learning  |                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
immich_machine_learning  |                              ^^^^^^^^^^^^^
immich_machine_learning  |                              onnxruntime.capi.onnxruntime_pybind11_state.Runtime
immich_machine_learning  |                              Exception: [ONNXRuntimeError] : 6 :
immich_machine_learning  |                              RUNTIME_EXCEPTION : Non-zero status code returned
immich_machine_learning  |                              while running Concat node. Name:'Concat.16' Status
immich_machine_learning  |                              Message:
immich_machine_learning  |                              /onnxruntime_src/onnxruntime/core/framework/bfc_are
immich_machine_learning  |                              na.cc:376 void*
immich_machine_learning  |                              onnxruntime::BFCArena::AllocateRawInternal(size_t,
immich_machine_learning  |                              bool, onnxruntime::Stream*, bool,
immich_machine_learning  |                              onnxruntime::WaitNotificationFn) Failed to allocate
immich_machine_learning  |                              memory for requested buffer of size 3706060800

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    To triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions