-
Notifications
You must be signed in to change notification settings - Fork 13.4k
vulkan: use memory budget extension to read memory usage #15545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Is this solving any particular problem? |
Yes, I forgot to mention. |
Additionally in llama.cpp, when using multiple GPUs, the free memory is used to determine the default layer split. |
ggml/src/ggml-vulkan/ggml-vulkan.cpp
Outdated
|
||
for (const auto & ext : extensionprops) { | ||
if (std::string(ext.extensionName.data()) == VK_EXT_MEMORY_BUDGET_EXTENSION_NAME) { | ||
membudget_extension_supported = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This list can include hundreds of extensions, I think you should precompute this when the instance is created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, I've moved the support detection to ggml_vk_instance_init
ggml/src/ggml-vulkan/ggml-vulkan.cpp
Outdated
|
||
bool membudget_supported = false; | ||
for (const auto & ext : extensionprops) { | ||
if (std::string(ext.extensionName.data()) == VK_EXT_MEMORY_BUDGET_EXTENSION_NAME) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer strcmp, but it's not a huge deal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
This implementation does not work yet. The problem is that heapUsage will only show the current process heap usage, which at the start of the process is basically 0. See VK_EXT_memory_budget documentation. The correct way is to return the memoryBudget instead. I have fixed this and also combined the two getMemoryProperties calls. diff --git a/ggml/src/ggml-vulkan/ggml-vulkan.cpp b/ggml/src/ggml-vulkan/ggml-vulkan.cpp
index b7f8b5a38..96e244c72 100644
--- a/ggml/src/ggml-vulkan/ggml-vulkan.cpp
+++ b/ggml/src/ggml-vulkan/ggml-vulkan.cpp
@@ -11497,25 +11497,24 @@ void ggml_backend_vk_get_device_memory(int device, size_t * free, size_t * total
GGML_ASSERT(device < (int) vk_instance.device_supports_membudget.size());
vk::PhysicalDevice vkdev = vk_instance.instance.enumeratePhysicalDevices()[vk_instance.device_indices[device]];
- vk::PhysicalDeviceMemoryProperties memprops = vkdev.getMemoryProperties();
bool membudget_supported = vk_instance.device_supports_membudget[device];
+ vk::PhysicalDeviceMemoryProperties2 memprops;
vk::PhysicalDeviceMemoryBudgetPropertiesEXT budgetprops;
- vk::PhysicalDeviceMemoryProperties2 memprops2 = {};
if (membudget_supported) {
- memprops2.pNext = &budgetprops;
- vkdev.getMemoryProperties2(&memprops2);
+ memprops.pNext = &budgetprops;
}
+ vkdev.getMemoryProperties2(&memprops);
- for (uint32_t i = 0; i < memprops.memoryHeapCount; ++i) {
- const vk::MemoryHeap & heap = memprops.memoryHeaps[i];
+ for (uint32_t i = 0; i < memprops.memoryProperties.memoryHeapCount; ++i) {
+ const vk::MemoryHeap & heap = memprops.memoryProperties.memoryHeaps[i];
if (heap.flags & vk::MemoryHeapFlagBits::eDeviceLocal) {
*total = heap.size;
if (membudget_supported && i < budgetprops.heapUsage.size()) {
- *free = *total - budgetprops.heapUsage[i];
+ *free = budgetprops.heapBudget[i];
} else {
*free = heap.size;
} As a sidenote, for whatever reason Intel shows a pretty low budget, despite empty VRAM:
while AMD looks as expected:
and so does Nvidia:
|
… budget extension is available
@0cc4m Good catch, I've only run tests from the current process, and this method seemed more precise and reported the same memory footprint across both Vulkan and CUDA backends. It appears that I noticed that Also, thanks for testing this with various GPUs! |
Oh yeah, that's true. I only thought about the initial value for layer estimations, but of course you can keep using it later in the program. Thank you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's working as expected now, at least on AMD and Nvidia. On Intel the number doesn't change from the ~14/16 GB it shows, regardless of how loaded the GPU is. But that is a driver issue.
Use the memory budget extension when it's available to read the memory consumption of a Vulkan device.