ML OCR memory leaks?

### I have searched the existing issues, both open and closed, to make sure this is not a duplicate report.

- [x] Yes

### The bug

I have over 30k photos and videos on my immich server, which has no gpu. So after I upload a bunch of fresh photos, I run immich-ml on my pc that has RTX 4070 Super via docker desktop and it crunches through face detection and smart search stuff in mere minutes. With OCR it runs like a turtle, so I left it running overnight.

When I came back 5 hours later, container ate all my resources:
- CPU: 99.97% (AMD Ryzen 5600X 6c/12t)
- RAM (container): 12.32GB / 15.58GB
- GPU VRAM: 11.7/12GB Dedicated, 15.5/16GB Shared, 27.2/28GB Total memory (ReBAR?)
You can see from logs that it all happened in just one hour

This behavior is not isolated to Docker desktop, WSL or CUDA. When immich switched to its internal ml server, it maxed out 48c/96t Xeon CPU and load has not dropped even after I cancelled OCR job and cleaned the queue. The only fix was to restart ml container.

Used model: PP-OCRv5_server (~~I will test mobile version and report its performance later~~ UPD: PP-OCRv5_mobile doesn't have such issue)


Not related, but for some reason I couldn't run OCR in parallel (concurrency > 1), ml worker errors with
```
[E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running FusedConv node. Name:'Conv.0' Status Message: CUDA error cudaErrorStreamCaptureUnsupported:operation not permitted when stream is capturing
```

### The OS that Immich Server is running on

Debian 13 (via docker compose)

### Version of Immich Server

v2.2.0

### Version of Immich Mobile App

irrelevant

### Platform with the issue

- [x] Server
- [ ] Web
- [ ] Mobile

### Device make and model

_No response_

### Your docker-compose.yml content

```YAML
# ML worker compose content
name: immich

services:
  immich-machine-learning:
    container_name: immich_machine_learning
    # For hardware acceleration, add one of -[armnn, cuda, rocm, openvino, rknn] to the image tag.
    # Example tag: ${IMMICH_VERSION:-release}-cuda
    image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}-cuda
    extends: # uncomment this section for hardware acceleration - see https://docs.immich.app/features/ml-hardware-acceleration
      file: hwaccel.ml.yml
      service: cuda # set to one of [armnn, cuda, rocm, openvino, openvino-wsl, rknn] for accelerated inference - use the `-wsl` version for WSL2 where applicable
    volumes:
      - model-cache:/cache
    environment:
      - IMMICH_VERSION=v2
    restart: always
    healthcheck:
      disable: false
    ports:
      - 3003:3003
volumes:
  model-cache:
```

### Your .env content

```Shell
IMMICH_VERSION=v2
```

### Reproduction steps

described above

### Relevant log output

```shell
immich_machine_learning  | [11/01/25 00:49:08] INFO     Setting execution providers to
immich_machine_learning  |                              ['CUDAExecutionProvider', 'CPUExecutionProvider'],
immich_machine_learning  |                              in descending order of preference
immich_machine_learning  | [11/01/25 00:49:08] INFO     Using engine_name: onnxruntime
immich_machine_learning  | 2025-11-01 01:39:54.138027723 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running Concat node. Name:'Concat.16' Status Message: /onnxruntime_src/onnxruntime/core/framework/bfc_arena.cc:376 void* onnxruntime::BFCArena::AllocateRawInternal(size_t, bool, onnxruntime::Stream*, bool, onnxruntime::WaitNotificationFn) Failed to allocate memory for requested buffer of size 3706060800
immich_machine_learning  |
immich_machine_learning  | [11/01/25 01:39:54] ERROR    Exception in ASGI application
immich_machine_learning  |
immich_machine_learning  |                              ╭─────── Traceback (most recent call last) ───────╮
immich_machine_learning  |                              │ /opt/venv/lib/python3.11/site-packages/rapidocr │
immich_machine_learning  |                              │ /inference_engine/onnxruntime/main.py:90 in     │
immich_machine_learning  |                              │ __call__                                        │
immich_machine_learning  |                              │                                                 │
immich_machine_learning  |                              │    87 │   def __call__(self, input_content: np. │
immich_machine_learning  |                              │    88 │   │   input_dict = dict(zip(self.get_in │
immich_machine_learning  |                              │    89 │   │   try:                              │
immich_machine_learning  |                              │ ❱  90 │   │   │   return self.session.run(self. │
immich_machine_learning  |                              │    91 │   │   except Exception as e:            │
immich_machine_learning  |                              │    92 │   │   │   error_info = traceback.format │
immich_machine_learning  |                              │    93 │   │   │   raise ONNXRuntimeError(error_ │
immich_machine_learning  |                              │                                                 │
immich_machine_learning  |                              │ /opt/venv/lib/python3.11/site-packages/onnxrunt │
immich_machine_learning  |                              │ ime/capi/onnxruntime_inference_collection.py:22 │
immich_machine_learning  |                              │ 0 in run                                        │
immich_machine_learning  |                              │                                                 │
immich_machine_learning  |                              │    217 │   │   if not output_names:             │
immich_machine_learning  |                              │    218 │   │   │   output_names = [output.name  │
immich_machine_learning  |                              │    219 │   │   try:                             │
immich_machine_learning  |                              │ ❱  220 │   │   │   return self._sess.run(output │
immich_machine_learning  |                              │    221 │   │   except C.EPFail as err:          │
immich_machine_learning  |                              │    222 │   │   │   if self._enable_fallback:    │
immich_machine_learning  |                              │    223 │   │   │   │   print(f"EP Error: {err!s │
immich_machine_learning  |                              ╰─────────────────────────────────────────────────╯
immich_machine_learning  |                              RuntimeException: [ONNXRuntimeError] : 6 :
immich_machine_learning  |                              RUNTIME_EXCEPTION : Non-zero status code returned
immich_machine_learning  |                              while running Concat node. Name:'Concat.16' Status
immich_machine_learning  |                              Message:
immich_machine_learning  |                              /onnxruntime_src/onnxruntime/core/framework/bfc_are
immich_machine_learning  |                              na.cc:376 void*
immich_machine_learning  |                              onnxruntime::BFCArena::AllocateRawInternal(size_t,
immich_machine_learning  |                              bool, onnxruntime::Stream*, bool,
immich_machine_learning  |                              onnxruntime::WaitNotificationFn) Failed to allocate
immich_machine_learning  |                              memory for requested buffer of size 3706060800
immich_machine_learning  |
immich_machine_learning  |
immich_machine_learning  |                              The above exception was the direct cause of the
immich_machine_learning  |                              following exception:
immich_machine_learning  |
immich_machine_learning  |                              ╭─────── Traceback (most recent call last) ───────╮
immich_machine_learning  |                              │ /usr/src/immich_ml/main.py:177 in predict       │
immich_machine_learning  |                              │                                                 │
immich_machine_learning  |                              │   174 │   │   inputs = text                     │
immich_machine_learning  |                              │   175 │   else:                                 │
immich_machine_learning  |                              │   176 │   │   raise HTTPException(400, "Either  │
immich_machine_learning  |                              │ ❱ 177 │   response = await run_inference(inputs │
immich_machine_learning  |                              │   178 │   return ORJSONResponse(response)       │
immich_machine_learning  |                              │   179                                           │
immich_machine_learning  |                              │   180                                           │
immich_machine_learning  |                              │                                                 │
immich_machine_learning  |                              │ /usr/src/immich_ml/main.py:202 in run_inference │
immich_machine_learning  |                              │                                                 │
immich_machine_learning  |                              │   199 │   │   response[entry["task"]] = output  │
immich_machine_learning  |                              │   200 │                                         │
immich_machine_learning  |                              │   201 │   without_deps, with_deps = entries     │
immich_machine_learning  |                              │ ❱ 202 │   await asyncio.gather(*[_run_inference │
immich_machine_learning  |                              │   203 │   if with_deps:                         │
immich_machine_learning  |                              │   204 │   │   await asyncio.gather(*[_run_infer │
immich_machine_learning  |                              │   205 │   if isinstance(payload, Image):        │
immich_machine_learning  |                              │                                                 │
immich_machine_learning  |                              │ /usr/src/immich_ml/main.py:197 in               │
immich_machine_learning  |                              │ _run_inference                                  │
immich_machine_learning  |                              │                                                 │
immich_machine_learning  |                              │   194 │   │   │   │   message = f"Task {entry[' │
immich_machine_learning  |                              │       output of {dep}"                          │
immich_machine_learning  |                              │   195 │   │   │   │   raise HTTPException(400,  │
immich_machine_learning  |                              │   196 │   │   model = await load(model)         │
immich_machine_learning  |                              │ ❱ 197 │   │   output = await run(model.predict, │
immich_machine_learning  |                              │   198 │   │   outputs[model.identity] = output  │
immich_machine_learning  |                              │   199 │   │   response[entry["task"]] = output  │
immich_machine_learning  |                              │   200                                           │
immich_machine_learning  |                              │                                                 │
immich_machine_learning  |                              │ /usr/src/immich_ml/main.py:215 in run           │
immich_machine_learning  |                              │                                                 │
immich_machine_learning  |                              │   212 │   if thread_pool is None:               │
immich_machine_learning  |                              │   213 │   │   return func(*args, **kwargs)      │
immich_machine_learning  |                              │   214 │   partial_func = partial(func, *args, * │
immich_machine_learning  |                              │ ❱ 215 │   return await asyncio.get_running_loop │
immich_machine_learning  |                              │   216                                           │
immich_machine_learning  |                              │   217                                           │
immich_machine_learning  |                              │   218 async def load(model: InferenceModel) ->  │
immich_machine_learning  |                              │                                                 │
immich_machine_learning  |                              │ /usr/local/lib/python3.11/concurrent/futures/th │
immich_machine_learning  |                              │ read.py:58 in run                               │
immich_machine_learning  |                              │                                                 │
immich_machine_learning  |                              │ /usr/src/immich_ml/models/base.py:60 in predict │
immich_machine_learning  |                              │                                                 │
immich_machine_learning  |                              │    57 │   │   self.load()                       │
immich_machine_learning  |                              │    58 │   │   if model_kwargs:                  │
immich_machine_learning  |                              │    59 │   │   │   self.configure(**model_kwargs │
immich_machine_learning  |                              │ ❱  60 │   │   return self._predict(*inputs)     │
immich_machine_learning  |                              │    61 │                                         │
immich_machine_learning  |                              │    62 │   @abstractmethod                       │
immich_machine_learning  |                              │    63 │   def _predict(self, *inputs: Any, **mo │
immich_machine_learning  |                              │                                                 │
immich_machine_learning  |                              │ /usr/src/immich_ml/models/ocr/detection.py:68   │
immich_machine_learning  |                              │ in _predict                                     │
immich_machine_learning  |                              │                                                 │
immich_machine_learning  |                              │   65 │   │   return session                     │
immich_machine_learning  |                              │   66 │                                          │
immich_machine_learning  |                              │   67 │   def _predict(self, inputs: bytes | Ima │
immich_machine_learning  |                              │ ❱ 68 │   │   results = self.model(decode_cv2(in │
immich_machine_learning  |                              │   69 │   │   if results.boxes is None or result │
immich_machine_learning  |                              │   70 │   │   │   return self._empty             │
immich_machine_learning  |                              │   71 │   │   return {                           │
immich_machine_learning  |                              │                                                 │
immich_machine_learning  |                              │ /opt/venv/lib/python3.11/site-packages/rapidocr │
immich_machine_learning  |                              │ /ch_ppocr_det/main.py:59 in __call__            │
immich_machine_learning  |                              │                                                 │
immich_machine_learning  |                              │    56 │   │   if prepro_img is None:            │
immich_machine_learning  |                              │    57 │   │   │   return TextDetOutput()        │
immich_machine_learning  |                              │    58 │   │                                     │
immich_machine_learning  |                              │ ❱  59 │   │   preds = self.session(prepro_img)  │
immich_machine_learning  |                              │    60 │   │   boxes, scores = self.postprocess_ │
immich_machine_learning  |                              │    61 │   │   if len(boxes) < 1:                │
immich_machine_learning  |                              │    62 │   │   │   return TextDetOutput()        │
immich_machine_learning  |                              │                                                 │
immich_machine_learning  |                              │ /opt/venv/lib/python3.11/site-packages/rapidocr │
immich_machine_learning  |                              │ /inference_engine/onnxruntime/main.py:93 in     │
immich_machine_learning  |                              │ __call__                                        │
immich_machine_learning  |                              │                                                 │
immich_machine_learning  |                              │    90 │   │   │   return self.session.run(self. │
immich_machine_learning  |                              │    91 │   │   except Exception as e:            │
immich_machine_learning  |                              │    92 │   │   │   error_info = traceback.format │
immich_machine_learning  |                              │ ❱  93 │   │   │   raise ONNXRuntimeError(error_ │
immich_machine_learning  |                              │    94 │                                         │
immich_machine_learning  |                              │    95 │   def get_input_names(self) -> List[str │
immich_machine_learning  |                              │    96 │   │   return [v.name for v in self.sess │
immich_machine_learning  |                              ╰─────────────────────────────────────────────────╯
immich_machine_learning  |                              ONNXRuntimeError: Traceback (most recent call
immich_machine_learning  |                              last):
immich_machine_learning  |                                File
immich_machine_learning  |                              "/opt/venv/lib/python3.11/site-packages/rapidocr/in
immich_machine_learning  |                              ference_engine/onnxruntime/main.py", line 90, in
immich_machine_learning  |                              __call__
immich_machine_learning  |                                  return
immich_machine_learning  |                              self.session.run(self.get_output_names(),
immich_machine_learning  |                              input_dict)[0]
immich_machine_learning  |                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
immich_machine_learning  |                              ^^^^^^^^^^^^^
immich_machine_learning  |                                File
immich_machine_learning  |                              "/opt/venv/lib/python3.11/site-packages/onnxruntime
immich_machine_learning  |                              /capi/onnxruntime_inference_collection.py", line
immich_machine_learning  |                              220, in run
immich_machine_learning  |                                  return self._sess.run(output_names, input_feed,
immich_machine_learning  |                              run_options)
immich_machine_learning  |                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
immich_machine_learning  |                              ^^^^^^^^^^^^^
immich_machine_learning  |                              onnxruntime.capi.onnxruntime_pybind11_state.Runtime
immich_machine_learning  |                              Exception: [ONNXRuntimeError] : 6 :
immich_machine_learning  |                              RUNTIME_EXCEPTION : Non-zero status code returned
immich_machine_learning  |                              while running Concat node. Name:'Concat.16' Status
immich_machine_learning  |                              Message:
immich_machine_learning  |                              /onnxruntime_src/onnxruntime/core/framework/bfc_are
immich_machine_learning  |                              na.cc:376 void*
immich_machine_learning  |                              onnxruntime::BFCArena::AllocateRawInternal(size_t,
immich_machine_learning  |                              bool, onnxruntime::Stream*, bool,
immich_machine_learning  |                              onnxruntime::WaitNotificationFn) Failed to allocate
immich_machine_learning  |                              memory for requested buffer of size 3706060800
```

### Additional information

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ML OCR memory leaks? #23462

I have searched the existing issues, both open and closed, to make sure this is not a duplicate report.

The bug

The OS that Immich Server is running on

Version of Immich Server

Version of Immich Mobile App

Platform with the issue

Device make and model

Your docker-compose.yml content

Your .env content

Reproduction steps

Relevant log output

Additional information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

ML OCR memory leaks? #23462

Description

I have searched the existing issues, both open and closed, to make sure this is not a duplicate report.

The bug

The OS that Immich Server is running on

Version of Immich Server

Version of Immich Mobile App

Platform with the issue

Device make and model

Your docker-compose.yml content

Your .env content

Reproduction steps

Relevant log output

Additional information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions