-
Notifications
You must be signed in to change notification settings - Fork 5.1k
vulkan: fix SIGSEGV on Android when using buffer_device_address #3590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4353,6 +4353,7 @@ static vk_device ggml_vk_get_device(size_t idx) { | |
| device->coopmat_support = false; | ||
| device->integer_dot_product = false; | ||
| bool bfloat16_support = false; | ||
| bool buffer_device_address_khr = false; | ||
|
|
||
| for (const auto& properties : ext_props) { | ||
| if (strcmp("VK_KHR_maintenance4", properties.extensionName) == 0) { | ||
|
|
@@ -4397,6 +4398,8 @@ static vk_device ggml_vk_get_device(size_t idx) { | |
| } else if (strcmp("VK_EXT_memory_priority", properties.extensionName) == 0 && | ||
| getenv("GGML_VK_ENABLE_MEMORY_PRIORITY")) { | ||
| device->memory_priority = true; | ||
| } else if (strcmp("VK_KHR_buffer_device_address", properties.extensionName) == 0) { | ||
| buffer_device_address_khr = true; | ||
| } | ||
| } | ||
|
|
||
|
|
@@ -4791,7 +4794,9 @@ static vk_device ggml_vk_get_device(size_t idx) { | |
| throw std::runtime_error("Unsupported device"); | ||
| } | ||
|
|
||
| device_extensions.push_back("VK_KHR_16bit_storage"); | ||
| if (fp16_storage) { | ||
| device_extensions.push_back("VK_KHR_16bit_storage"); | ||
| } | ||
|
|
||
| #ifdef GGML_VULKAN_VALIDATE | ||
| device_extensions.push_back("VK_KHR_shader_non_semantic_info"); | ||
|
|
@@ -4801,6 +4806,11 @@ static vk_device ggml_vk_get_device(size_t idx) { | |
| device_extensions.push_back("VK_KHR_shader_float16_int8"); | ||
| } | ||
|
|
||
| // Required for physical devices that only support Vulkan 1.1 | ||
| if (device->buffer_device_address && buffer_device_address_khr) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you check whether buffer_device_address is initialized based on stack garbage in VkPhysicalDeviceVulkan12Features? Another option might be to just zero-init that structure and then you don't need the getBufferDeviceAddress function pointer.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same isssue if I do
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If this was not the case, then how did we get buffer_device_address set to true in the first place?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure if I understand your comment.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My question is why, without any of your changes, is device->buffer_device_address == true?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That makes senses. I printed ggml-vulkan is testing loader version and not the actual device ? But if we change that, this will disable most android devices. I can confirm ggml works well with Vulkan GPU. I tested 3 different devices for info (all crashing without my patches). I also confirm that only forcing
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My question: Is 1.3 Loader aware of 1.2 features ? So that 1.2 features are loaded and can be used on 1.1 devices If that's the case, I think we should keep that features for android devices. I could store
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I think it is, and that's probably not quite what we want. ggml-vulkan generally checks for any optional features it relies on (like bufferDeviceAddress - optional in Vulkan 1.2) but if something is mandatory in Vulkan 1.2 we may not check for it.
I don't think the loader emulates features structures. I'd be very surprised if that's the case. I'd still like to better understand how vk12_features.bufferDeviceAddress is getting set to true.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for your review.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Enabling VK Validation layer on Android shows the clear pictures.
Since it works as it is for few 1.1 GPU, I'm wondering If I can adapt ggml for Vulkan 1.1, the task will be:
Would you be interested in a PR ? Maybe you don't want to add more complexity, by supporting older Vulkan, but it would benefit a lot of android devices. Validation logs when loading a model: |
||
| device_extensions.push_back("VK_KHR_buffer_device_address"); | ||
| } | ||
|
|
||
| #if defined(VK_KHR_cooperative_matrix) | ||
| if (device->coopmat_support) { | ||
| // Query supported shapes | ||
|
|
@@ -4924,6 +4934,10 @@ static vk_device ggml_vk_get_device(size_t idx) { | |
| device_create_info.setPNext(&device_features2); | ||
| device->device = device->physical_device.createDevice(device_create_info); | ||
|
|
||
| // optionally initialize the dispatcher with a vk::Device to get | ||
| // device-specific function pointers (needed on Android) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this actually needed? Do you know why?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is needed, I double-checked by removing it and keeping other fixes: same crash. Note that this is also needed on a WIP branch on VLC for android, I use volk and I had to call
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm concerned this won't work on systems with multiple vulkan devices.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Indeed, that's a problem... |
||
| VULKAN_HPP_DEFAULT_DISPATCHER.init(device->device); | ||
|
|
||
| // Queues | ||
| ggml_vk_create_queue(device, device->compute_queue, compute_queue_family_index, 0, { vk::PipelineStageFlagBits::eComputeShader | vk::PipelineStageFlagBits::eTransfer }, false); | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This extension was promoted to Vulkan 1.1. Is this change necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on x86_64 emulator, without this change:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it ought to work if we don't push this extension string at all, but this change is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The capability is definitely required, the backend cannot run without
fp16_storage. If the way we are currently initialising it with the extension does not work for you, please find another way (like not pushing the string at all). Making it conditional is just confusing.