Skip to content

--offload-to-cpu may cause OOM errors on VulkanΒ #791

@wbruna

Description

@wbruna

It seems the offloading code attempts to allocate a single buffer on the device for the offloaded weights. This can easily hit Vulkan's single-allocation limit.

For instance, with an SDXL model and the DMD2 VAE (which work fine without --offload-to-cpu):

[INFO ] stable-diffusion.cpp:846  - attempting to apply 1 LoRAs
[INFO ] model.cpp:1038 - load /opt/sdif/models/LoRA/dmd2_sdxl_4step_lora_fp16.safetensors using safetensors format
[INFO ] lora.hpp:118  - loading LoRA from '/opt/sdif/models/LoRA/dmd2_sdxl_4step_lora_fp16.safetensors'
  |==================================================| 2364/2364 - 2364000.00it/s
  |==================================================| 2364/2364 - 7625.81it/s
ggml_vulkan: Device memory allocation of size 5365186568 failed.
ggml_vulkan: Requested buffer size exceeds device memory allocation limit: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 5365186568
[ERROR] ggml_extend.hpp:1412 - lora: failed to allocate the compute buffer

Related: ggml-org/llama.cpp#15815

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions