Skip to content

Conversation

@gyf304
Copy link

@gyf304 gyf304 commented Oct 4, 2024

Some vulkan devices (namely integrated graphics cards) have multiple memory heaps: a smaller dedicated memory and a larger shared memory.

ggml uses the first usable memory type, which usually resides on the smaller dedicated memory heap. This can likely cause allocation failures. This patch adds an environment variable that forces allocation on a specific memory heap.

@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Oct 4, 2024
@0cc4m 0cc4m self-requested a review October 4, 2024 19:20
memory_type_index = find_properties(&mem_props, &mem_req, fallback_flags);
memory_type_index = find_properties(&mem_props, &mem_req, fallback_flags, device->force_heap_index);
buf->memory_property_flags = fallback_flags;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this fallback already handle that situation? That's basically what it was introduced for. You have a smaller (DEVICE_LOCAL) and a larger heap (DEVICE_LOCAL, HOST_VISIBLE and HOST_COHERENT), so once the small one is exhausted you fall back to the larger one.
Can you add the vulkaninfo of the device where this fallback does not work and describe why?

Copy link
Author

@gyf304 gyf304 Oct 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vulkaninfo.txt

The problem is that find_properties doesn't take in account about memory usage (as vk::PhysicalDeviceMemoryProperties reports total mem, not available mem)
On boot, with a few chrome tabs open, my laptop consumes 2.8/4GiB of the smaller heap, so if it tries to allocate 2GiB, it will fail.

One could argue the better fix is to either

  • use VK_EXT_memory_budget for querying actual memory usage in find_properties, so it doesn't select a memory type that does not have enough available memory
  • use a loop while allocating, if the allocation fails with ErrorOutOfDeviceMemory, try the next available memory type
  • or a combination of both

I did the minimal viable solution here, which is to force allocation from a known bigger heap (in my case, heap 1).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gyf304 Thank you.

Take a look at ggml-org/whisper.cpp#2451. This might be the kind of fallback that you described in your second point. Can you check whether that would fix your issue?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, I just pulled ggml-org/whisper.cpp#2451 and will soon sync it upstream with ggml and llama.cpp. Not sure if it fixes the problem here, but it worked for a couple of whisper.cpp-related issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants