vulkan: Added GGML_VK_DEVICE{idx}_MEMORY environment variable for setting device memory to manually allocate workload. #11878

System233 · 2025-02-14T21:39:00Z

Users can manually specify the memory usage of a device using the GGML_VK_DEVICE{idx}_MEMORY environment variable, based on their specific needs, to allocate the workload accordingly.

For example, setting GGML_VK_DEVICE0_MEMORY=2000000000 configures 2000MB of memory on the Vulkan0 device, and correspondingly, a 2000MB workload is allocated for model computation on that device.

Especially in environments with integrated graphics, users no longer need to reboot and enter the BIOS to configure VRAM, nor do they need to worry about portions of memory allocated as VRAM being idle. By simply using the GGML_VK_DEVICE{idx}_MEMORY environment variable to manually configure the memory amount, this provides a significant benefit for future devices equipped with high-performance integrated graphics.

Finally, maintain the original behavior when GGML_VK_DEVICE{idx}_MEMORY is not set, just like GGML_VK_VISIBLE_DEVICES.

…ting device memory to manually allocate workload.

wbruna · 2025-02-16T22:00:19Z

Since my 3400G seems to behave the same for any GGML_VK_DEVICE0_MEMORY value above 0, I guess this only matters for splitting across devices. But if that's the case, perhaps it'd be better to extend LLAMA_ARG_TENSOR_SPLIT to accept absolute values?

users no longer need to reboot and enter the BIOS to configure VRAM

That's unfortunately not quite the same: I notice significant speed improvements with reserved VRAM instead of shared memory (see some of my tests on the Vulkan speed discussion: #10879 (comment)).

0cc4m · 2025-02-17T06:48:46Z

I think tensor split is the only thing this affects (apart from some log lines), and not in a different way to the tensor-split argument.

@System233 Can you give more detail on what you are trying to solve here? Are you using multiple GPUs? In the single-GPU case this doesn't do anything, apart from changing the memory that the application reports.

System233 · 2025-02-17T08:18:24Z

Apologies, it was my mistake. Previously, I was using the Vulkan backend for Llama in LMStudio, and I removed LMS's GPU device filtering, allowing it to call the 780M integrated GPU. However, it consistently only allocated 768MB of VRAM as reported by Llama. I didn’t want to assign too much dedicated VRAM to the integrated GPU in the BIOS, so I modified Llama's code, then recompiled and replaced LMS's Vulkan backend.

@0cc4m You were right. Using llama-cli with the -ts parameter makes it very convenient to configure the workload for each device. I’m really sorry about that.
@wbruna Thank you for providing the benchmark. This is the first time I’ve seen a comparison of the performance difference between dedicated and shared VRAM on integrated GPUs. I previously thought there wasn’t much of a difference, and I even believed allocating large amounts of VRAM to integrated graphics was unnecessary.

0cc4m · 2025-02-17T09:02:28Z

Apologies, it was my mistake. Previously, I was using the Vulkan backend for Llama in LMStudio, and I removed LMS's GPU device filtering, allowing it to call the 780M integrated GPU. However, it consistently only allocated 768MB of VRAM as reported by Llama. I didn’t want to assign too much dedicated VRAM to the integrated GPU in the BIOS, so I modified Llama's code, then recompiled and replaced LMS's Vulkan backend.

I understand, so LMStudio uses the amount of available memory in some way. They were probably just thinking of dedicated GPUs, since integrated GPUs can use more than just a small portion of RAM that is dedicated to them.

If you can manually set the amount of GPU layers in LMStudio you should be able to set it much higher on your 780M than what the VRAM size would indicate. You should be able to use up at least up to half of your RAM without any BIOS changes.

vulkan: Added GGML_VK_DEVICE{idx}_MEMORY environment variable for set…

c0d07d5

…ting device memory to manually allocate workload.

github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Feb 14, 2025

System233 closed this Feb 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vulkan: Added GGML_VK_DEVICE{idx}_MEMORY environment variable for setting device memory to manually allocate workload. #11878

vulkan: Added GGML_VK_DEVICE{idx}_MEMORY environment variable for setting device memory to manually allocate workload. #11878

Uh oh!

System233 commented Feb 14, 2025

Uh oh!

wbruna commented Feb 16, 2025

Uh oh!

0cc4m commented Feb 17, 2025

Uh oh!

System233 commented Feb 17, 2025

Uh oh!

0cc4m commented Feb 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vulkan: Added GGML_VK_DEVICE{idx}_MEMORY environment variable for setting device memory to manually allocate workload. #11878

vulkan: Added GGML_VK_DEVICE{idx}_MEMORY environment variable for setting device memory to manually allocate workload. #11878

Uh oh!

Conversation

System233 commented Feb 14, 2025

Uh oh!

wbruna commented Feb 16, 2025

Uh oh!

0cc4m commented Feb 17, 2025

Uh oh!

System233 commented Feb 17, 2025

Uh oh!

0cc4m commented Feb 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants