vulkan: use memory budget extension to read memory usage #15545

giladgd · 2025-08-24T17:58:09Z

Use the memory budget extension when it's available to read the memory consumption of a Vulkan device.

jeffbolznv · 2025-08-24T19:06:51Z

Is this solving any particular problem?

giladgd · 2025-08-24T20:14:03Z

Yes, I forgot to mention.
In derived modules (like node-llama-cpp in my case) that use ggml_backend_dev_memory to check the memory usage of a backend device, the CUDA backend reports the actual memory usage just fine, but in Vulkan it always shows that the entire memory is free regardless of actual usage. This PR solves that.

slaren · 2025-08-24T20:37:20Z

Additionally in llama.cpp, when using multiple GPUs, the free memory is used to determine the default layer split.

jeffbolznv · 2025-08-29T14:43:33Z

ggml/src/ggml-vulkan/ggml-vulkan.cpp

+
+    for (const auto & ext : extensionprops) {
+        if (std::string(ext.extensionName.data()) == VK_EXT_MEMORY_BUDGET_EXTENSION_NAME) {
+            membudget_extension_supported = true;


This list can include hundreds of extensions, I think you should precompute this when the instance is created.

Good idea, I've moved the support detection to ggml_vk_instance_init

jeffbolznv · 2025-08-31T01:00:15Z

ggml/src/ggml-vulkan/ggml-vulkan.cpp

+
+        bool membudget_supported = false;
+        for (const auto & ext : extensionprops) {
+            if (std::string(ext.extensionName.data()) == VK_EXT_MEMORY_BUDGET_EXTENSION_NAME) {


I'd prefer strcmp, but it's not a huge deal.

0cc4m · 2025-08-31T07:50:55Z

This implementation does not work yet. The problem is that heapUsage will only show the current process heap usage, which at the start of the process is basically 0. See VK_EXT_memory_budget documentation. The correct way is to return the memoryBudget instead. I have fixed this and also combined the two getMemoryProperties calls.

diff --git a/ggml/src/ggml-vulkan/ggml-vulkan.cpp b/ggml/src/ggml-vulkan/ggml-vulkan.cpp
index b7f8b5a38..96e244c72 100644
--- a/ggml/src/ggml-vulkan/ggml-vulkan.cpp
+++ b/ggml/src/ggml-vulkan/ggml-vulkan.cpp
@@ -11497,25 +11497,24 @@ void ggml_backend_vk_get_device_memory(int device, size_t * free, size_t * total
     GGML_ASSERT(device < (int) vk_instance.device_supports_membudget.size());
 
     vk::PhysicalDevice vkdev = vk_instance.instance.enumeratePhysicalDevices()[vk_instance.device_indices[device]];
-    vk::PhysicalDeviceMemoryProperties memprops = vkdev.getMemoryProperties();
     bool membudget_supported = vk_instance.device_supports_membudget[device];
 
+    vk::PhysicalDeviceMemoryProperties2 memprops;
     vk::PhysicalDeviceMemoryBudgetPropertiesEXT budgetprops;
-    vk::PhysicalDeviceMemoryProperties2 memprops2 = {};
 
     if (membudget_supported) {
-        memprops2.pNext = &budgetprops;
-        vkdev.getMemoryProperties2(&memprops2);
+        memprops.pNext = &budgetprops;
     }
+    vkdev.getMemoryProperties2(&memprops);
 
-    for (uint32_t i = 0; i < memprops.memoryHeapCount; ++i) {
-        const vk::MemoryHeap & heap = memprops.memoryHeaps[i];
+    for (uint32_t i = 0; i < memprops.memoryProperties.memoryHeapCount; ++i) {
+        const vk::MemoryHeap & heap = memprops.memoryProperties.memoryHeaps[i];
 
         if (heap.flags & vk::MemoryHeapFlagBits::eDeviceLocal) {
             *total = heap.size;
 
             if (membudget_supported && i < budgetprops.heapUsage.size()) {
-                *free = *total - budgetprops.heapUsage[i];
+                *free = budgetprops.heapBudget[i];
             } else {
                 *free = heap.size;
             }

As a sidenote, for whatever reason Intel shows a pretty low budget, despite empty VRAM:

memoryHeaps[0]:
                size   = 16810770432 (0x3ea000000) (15.66 GiB)
                budget = 14891876352 (0x377a00000) (13.87 GiB)

while AMD looks as expected:

memoryHeaps[1]:
                size   = 17163091968 (0x3ff000000) (15.98 GiB)
                budget = 17152225280 (0x3fe5a3000) (15.97 GiB)

and so does Nvidia:

memoryHeaps[0]:
                size   = 25769803776 (0x600000000) (24.00 GiB)
                budget = 25281167360 (0x5e2e00000) (23.54 GiB)

… budget extension is available

giladgd · 2025-08-31T14:14:22Z

@0cc4m Good catch, I've only run tests from the current process, and this method seemed more precise and reported the same memory footprint across both Vulkan and CUDA backends.

It appears that budgetprops.heapBudget[i] reports the available memory budget excluding the usage of the current process, so budgetprops.heapBudget[i] - budgetprops.heapUsage[i] seems to be what we want.

I noticed that budgetprops.heapBudget[i] includes some Vulkan overhead related to the current process.
I checked budgetprops.heapBudget[i] before and after loading gpt-oss 20b mxfp4, and noticed a diff of -21.88MB.
Maybe it's worth adding an additional method to ggml_backend_device_i to check the memory usage of the current process in isolation to make it easier to inspect more precisely. I can do that in another PR.

Also, thanks for testing this with various GPUs!
I only got access to a machine with an Nvidia GPU (beside my own Mac), so I can only test on it.

0cc4m · 2025-09-01T07:27:55Z

It appears that budgetprops.heapBudget[i] reports the available memory budget excluding the usage of the current process, so budgetprops.heapBudget[i] - budgetprops.heapUsage[i] seems to be what we want.

Oh yeah, that's true. I only thought about the initial value for layer estimations, but of course you can keep using it later in the program. Thank you.

0cc4m

It's working as expected now, at least on AMD and Nvidia. On Intel the number doesn't change from the ~14/16 GB it shows, regardless of how loaded the GPU is. But that is a driver issue.

) * vulkan: use memory budget extension to read memory usage * fix: formatting and names * formatting * fix: detect and cache memory budget extension availability on init * fix: read `budgetprops.heapBudget` instead of `heap.size` when memory budget extension is available * style: lints

giladgd added 2 commits August 24, 2025 20:49

vulkan: use memory budget extension to read memory usage

3a0dcc3

fix: formatting and names

4ccfa38

giladgd requested a review from 0cc4m as a code owner August 24, 2025 17:58

formatting

ea5d796

github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Aug 24, 2025

jeffbolznv reviewed Aug 29, 2025

View reviewed changes

fix: detect and cache memory budget extension availability on init

6784772

jeffbolznv approved these changes Aug 31, 2025

View reviewed changes

fix: read budgetprops.heapBudget instead of heap.size when memory…

0934c3b

… budget extension is available

style: lints

a39edf4

0cc4m approved these changes Sep 1, 2025

View reviewed changes

0cc4m merged commit d4d8dbe into ggml-org:master Sep 1, 2025
45 of 48 checks passed

inforithmics mentioned this pull request Sep 26, 2025

Vulkan based on #9650 ollama/ollama#11835

Merged

36 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vulkan: use memory budget extension to read memory usage #15545

vulkan: use memory budget extension to read memory usage #15545

Uh oh!

giladgd commented Aug 24, 2025 •

edited

Loading

Uh oh!

jeffbolznv commented Aug 24, 2025

Uh oh!

giladgd commented Aug 24, 2025 •

edited

Loading

Uh oh!

slaren commented Aug 24, 2025

Uh oh!

jeffbolznv Aug 29, 2025

Uh oh!

giladgd Aug 30, 2025

Uh oh!

jeffbolznv Aug 31, 2025

Uh oh!

giladgd Aug 31, 2025

Uh oh!

0cc4m commented Aug 31, 2025

Uh oh!

giladgd commented Aug 31, 2025

Uh oh!

0cc4m commented Sep 1, 2025

Uh oh!

0cc4m left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vulkan: use memory budget extension to read memory usage #15545

vulkan: use memory budget extension to read memory usage #15545

Uh oh!

Conversation

giladgd commented Aug 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeffbolznv commented Aug 24, 2025

Uh oh!

giladgd commented Aug 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

slaren commented Aug 24, 2025

Uh oh!

jeffbolznv Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

giladgd Aug 30, 2025

Choose a reason for hiding this comment

Uh oh!

jeffbolznv Aug 31, 2025

Choose a reason for hiding this comment

Uh oh!

giladgd Aug 31, 2025

Choose a reason for hiding this comment

Uh oh!

0cc4m commented Aug 31, 2025

Uh oh!

giladgd commented Aug 31, 2025

Uh oh!

0cc4m commented Sep 1, 2025

Uh oh!

0cc4m left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

giladgd commented Aug 24, 2025 •

edited

Loading

giladgd commented Aug 24, 2025 •

edited

Loading