-
Notifications
You must be signed in to change notification settings - Fork 431
Closed
Description
It seems the offloading code attempts to allocate a single buffer on the device for the offloaded weights. This can easily hit Vulkan's single-allocation limit.
For instance, with an SDXL model and the DMD2 VAE (which work fine without --offload-to-cpu):
[INFO ] stable-diffusion.cpp:846 - attempting to apply 1 LoRAs
[INFO ] model.cpp:1038 - load /opt/sdif/models/LoRA/dmd2_sdxl_4step_lora_fp16.safetensors using safetensors format
[INFO ] lora.hpp:118 - loading LoRA from '/opt/sdif/models/LoRA/dmd2_sdxl_4step_lora_fp16.safetensors'
|==================================================| 2364/2364 - 2364000.00it/s
|==================================================| 2364/2364 - 7625.81it/s
ggml_vulkan: Device memory allocation of size 5365186568 failed.
ggml_vulkan: Requested buffer size exceeds device memory allocation limit: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 5365186568
[ERROR] ggml_extend.hpp:1412 - lora: failed to allocate the compute buffer
Related: ggml-org/llama.cpp#15815
amitbar05
Metadata
Metadata
Assignees
Labels
No labels