[Feature] Machine Learning should unload model in case of GPU memory issues #18131

schuettecarsten · 2025-05-06T17:09:17Z

schuettecarsten
May 6, 2025

I have searched the existing feature requests, both open and closed, to make sure this is not a duplicate request.

Yes

The feature

My GPU has 4 GB, which is not enough for all models. Immich ML unloads models after some time, but if new images are uploaded, Smart Search and Face Recognition run at the same time, loading their models. That sometimes leads to OOM errors. It is somewhat unclear when Immich retries to process the images or if they are simply ignored in Smart Search and Face Recognition if the ML job fails. Also, it would be great if Immich ML could prefer pending jobs for already unloaded models and - if there is not enough memory to load a model, first try to unload other models and try again.

The exceptions are:

2025-05-06 16:57:25.954860384 [E:onnxruntime:, inference_session.cc:2105 operator()] Exception during initialization: /onnxruntime_src/onnxruntime/core/framework/bfc_arena.cc:376 void* onnxruntime::BFCArena::AllocateRawInternal(size_t, bool, onnxruntime::Stream*, bool, onnxruntime::WaitNotificationFn) Failed to allocate memory for requested buffer of size 19832832

[05/06/25 16:57:25] ERROR    Exception in ASGI application

                             ╭─────── Traceback (most recent call last) ───────╮
                             │ /usr/src/immich_ml/main.py:177 in predict       │
                             │                                                 │
                             │   174 │   │   inputs = text                     │
                             │   175 │   else:                                 │
                             │   176 │   │   raise HTTPException(400, "Either  │
                             │ ❱ 177 │   response = await run_inference(inputs │
                             │   178 │   return ORJSONResponse(response)       │
                             │   179                                           │
                             │   180                                           │
                             │                                                 │
                             │ /usr/src/immich_ml/main.py:200 in run_inference │
                             │                                                 │
                             │   197 │   │   response[entry["task"]] = output  │
                             │   198 │                                         │
                             │   199 │   without_deps, with_deps = entries     │
                             │ ❱ 200 │   await asyncio.gather(*[_run_inference │
                             │   201 │   if with_deps:                         │
                             │   202 │   │   await asyncio.gather(*[_run_infer │
                             │   203 │   if isinstance(payload, Image):        │
                             │                                                 │
                             │ /usr/src/immich_ml/main.py:194 in               │
                             │ _run_inference                                  │
                             │                                                 │
                             │   191 │   │   │   except KeyError:              │
                             │   192 │   │   │   │   message = f"Task {entry[' │
                             │       output of {dep}"                          │
                             │   193 │   │   │   │   raise HTTPException(400,  │
                             │ ❱ 194 │   │   model = await load(model)         │
                             │   195 │   │   output = await run(model.predict, │
                             │   196 │   │   outputs[model.identity] = output  │
                             │   197 │   │   response[entry["task"]] = output  │
                             │                                                 │
                             │ /usr/src/immich_ml/main.py:238 in load          │
                             │                                                 │
                             │   235 │   │   return model                      │
                             │   236 │                                         │
                             │   237 │   try:                                  │
                             │ ❱ 238 │   │   return await run(_load, model)    │
                             │   239 │   except (OSError, InvalidProtobuf, Bad │
                             │   240 │   │   log.warning(f"Failed to load {mod │
                             │       '{model.model_name}'. Clearing cache.")   │
                             │   241 │   │   model.clear_cache()               │
                             │                                                 │
                             │ /usr/src/immich_ml/main.py:213 in run           │
                             │                                                 │
                             │   210 │   if thread_pool is None:               │
                             │   211 │   │   return func(*args, **kwargs)      │
                             │   212 │   partial_func = partial(func, *args, * │
                             │ ❱ 213 │   return await asyncio.get_running_loop │
                             │   214                                           │
                             │   215                                           │
                             │   216 async def load(model: InferenceModel) ->  │
                             │                                                 │
                             │ /usr/local/lib/python3.11/concurrent/futures/th │
                             │ read.py:58 in run                               │
                             │                                                 │
                             │ /usr/src/immich_ml/main.py:225 in _load         │
                             │                                                 │
                             │   222 │   │   │   raise HTTPException(500, f"Fa │
                             │   223 │   │   with lock:                        │
                             │   224 │   │   │   try:                          │
                             │ ❱ 225 │   │   │   │   model.load()              │
                             │   226 │   │   │   except FileNotFoundError as e │
                             │   227 │   │   │   │   if model.model_format ==  │
                             │   228 │   │   │   │   │   raise e               │
                             │                                                 │
                             │ /usr/src/immich_ml/models/base.py:54 in load    │
                             │                                                 │
                             │    51 │   │   self.download()                   │
                             │    52 │   │   attempt = f"Attempt #{self.load_a │
                             │       else "Loading"                            │
                             │    53 │   │   log.info(f"{attempt} {self.model_ │
                             │       '{self.model_name}' to memory")           │
                             │ ❱  54 │   │   self.session = self._load()       │
                             │    55 │   │   self.loaded = True                │
                             │    56 │                                         │
                             │    57 │   def predict(self, *inputs: Any, **mod │
                             │                                                 │
                             │ /usr/src/immich_ml/models/clip/textual.py:28 in │
                             │ _load                                           │
                             │                                                 │
                             │    25 │   │   return serialize_np_array(res)    │
                             │    26 │                                         │
                             │    27 │   def _load(self) -> ModelSession:      │
                             │ ❱  28 │   │   session = super()._load()         │
                             │    29 │   │   log.debug(f"Loading tokenizer for │
                             │    30 │   │   self.tokenizer = self._load_token │
                             │    31 │   │   tokenizer_kwargs: dict[str, Any]  │
                             │                                                 │
                             │ /usr/src/immich_ml/models/base.py:84 in _load   │
                             │                                                 │
                             │    81 │   │   )                                 │
                             │    82 │                                         │
                             │    83 │   def _load(self) -> ModelSession:      │
                             │ ❱  84 │   │   return self._make_session(self.mo │
                             │    85 │                                         │
                             │    86 │   def clear_cache(self) -> None:        │
                             │    87 │   │   if not self.cache_dir.exists():   │
                             │                                                 │
                             │ /usr/src/immich_ml/models/base.py:116 in        │
                             │ _make_session                                   │
                             │                                                 │
                             │   113 │   │   │   case ".armnn":                │
                             │   114 │   │   │   │   session: ModelSession = A │
                             │   115 │   │   │   case ".onnx":                 │
                             │ ❱ 116 │   │   │   │   session = OrtSession(mode │
                             │   117 │   │   │   case ".rknn":                 │
                             │   118 │   │   │   │   session = rknn.RknnSessio │
                             │   119 │   │   │   case _:                       │
                             │                                                 │
                             │ /usr/src/immich_ml/sessions/ort.py:28 in        │
                             │ __init__                                        │
                             │                                                 │
                             │    25 │   │   self.providers = providers if pro │
                             │    26 │   │   self.provider_options = provider_ │
                             │       self._provider_options_default            │
                             │    27 │   │   self.sess_options = sess_options  │
                             │       self._sess_options_default                │
                             │ ❱  28 │   │   self.session = ort.InferenceSessi │
                             │    29 │   │   │   self.model_path.as_posix(),   │
                             │    30 │   │   │   providers=self.providers,     │
                             │    31 │   │   │   provider_options=self.provide │
                             │                                                 │
                             │ /opt/venv/lib/python3.11/site-packages/onnxrunt │
                             │ ime/capi/onnxruntime_inference_collection.py:41 │
                             │ 9 in __init__                                   │
                             │                                                 │
                             │    416 │   │   disabled_optimizers = kwargs.get │
                             │    417 │   │                                    │
                             │    418 │   │   try:                             │
                             │ ❱  419 │   │   │   self._create_inference_sessi │
                             │        disabled_optimizers)                     │
                             │    420 │   │   except (ValueError, RuntimeError │
                             │    421 │   │   │   if self._enable_fallback:    │
                             │    422 │   │   │   │   try:                     │
                             │                                                 │
                             │ /opt/venv/lib/python3.11/site-packages/onnxrunt │
                             │ ime/capi/onnxruntime_inference_collection.py:49 │
                             │ 1 in _create_inference_session                  │
                             │                                                 │
                             │    488 │   │   │   disabled_optimizers = set(di │
                             │    489 │   │                                    │
                             │    490 │   │   # initialize the C++ InferenceSe │
                             │ ❱  491 │   │   sess.initialize_session(provider │
                             │    492 │   │                                    │
                             │    493 │   │   self._sess = sess                │
                             │    494 │   │   self._sess_options = self._sess. │
                             ╰─────────────────────────────────────────────────╯
                             RuntimeException: [ONNXRuntimeError] : 6 :
                             RUNTIME_EXCEPTION : Exception during
                             initialization:
                             /onnxruntime_src/onnxruntime/core/framework/bfc_are
                             na.cc:376 void*
                             onnxruntime::BFCArena::AllocateRawInternal(size_t,
                             bool, onnxruntime::Stream*, bool,
                             onnxruntime::WaitNotificationFn) Failed to allocate
                             memory for requested buffer of size 19832832

They are expected and fine, so it is not really a bug. But the way how ML module behaves in such cases can be optimized.

Platform

Server
Web
Mobile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature] Machine Learning should unload model in case of GPU memory issues #18131

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

[Feature] Machine Learning should unload model in case of GPU memory issues #18131

Uh oh!

Uh oh!

schuettecarsten May 6, 2025

I have searched the existing feature requests, both open and closed, to make sure this is not a duplicate request.

The feature

Platform

Replies: 0 comments

schuettecarsten
May 6, 2025