--offload-to-cpu may cause OOM errors on Vulkan

It seems the offloading code attempts to allocate a single buffer on the device for the offloaded weights. This can easily hit Vulkan's single-allocation limit.

For instance, with an SDXL model and the DMD2 VAE (which work fine without `--offload-to-cpu`):

```
[INFO ] stable-diffusion.cpp:846  - attempting to apply 1 LoRAs
[INFO ] model.cpp:1038 - load /opt/sdif/models/LoRA/dmd2_sdxl_4step_lora_fp16.safetensors using safetensors format
[INFO ] lora.hpp:118  - loading LoRA from '/opt/sdif/models/LoRA/dmd2_sdxl_4step_lora_fp16.safetensors'
  |==================================================| 2364/2364 - 2364000.00it/s
  |==================================================| 2364/2364 - 7625.81it/s
ggml_vulkan: Device memory allocation of size 5365186568 failed.
ggml_vulkan: Requested buffer size exceeds device memory allocation limit: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 5365186568
[ERROR] ggml_extend.hpp:1412 - lora: failed to allocate the compute buffer
```

Related: ggml-org/llama.cpp#15815

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

--offload-to-cpu may cause OOM errors on Vulkan #791

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

--offload-to-cpu may cause OOM errors on Vulkan #791

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions