-
Notifications
You must be signed in to change notification settings - Fork 12.7k
vulkan: fuse adds #15252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vulkan: fuse adds #15252
Conversation
Fuse adds that have the same shape, which are common in MoE models. It will currently fuse up to 6 adds, because we assume no more than 8 descriptors per dispatch. But this could be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good on AMD and Nvidia, but I can't get it to run on Intel.
terminate called after throwing an instance of 'vk::DeviceLostError'
what(): vk::Device::waitForFences: ErrorDeviceLost
I'll investigate further later.
Strange. Any validation failures? Does the backend test fail, or just in real models? |
Yeah, the test fails too on Intel:
Edit: No validation failures. Probably a driver bug. |
Shall I just disable the optimization for Intel? |
Yeah, I don't see why it's failing. |
Hi @0cc4m. I wanted to test the crashing you were seeing on Intel GPU but so far haven't been able to reproduce it. How were you testing this exactly? The test I ran was the following:
diff --git a/ggml/src/ggml-vulkan/ggml-vulkan.cpp b/ggml/src/ggml-vulkan/ggml-vulkan.cpp
index 7ef93806..24ede177 100644
--- a/ggml/src/ggml-vulkan/ggml-vulkan.cpp
+++ b/ggml/src/ggml-vulkan/ggml-vulkan.cpp
@@ -3575,7 +3575,7 @@ static vk_device ggml_vk_get_device(size_t idx) {
device->multi_add = vk12_props.shaderRoundingModeRTEFloat16 &&
device->properties.limits.maxPushConstantsSize >= sizeof(vk_op_multi_add_push_constants) &&
vk12_features.runtimeDescriptorArray &&
- device->vendor_id != VK_VENDOR_ID_INTEL &&
+ // device->vendor_id != VK_VENDOR_ID_INTEL &&
getenv("GGML_VK_DISABLE_MULTI_ADD") == nullptr;
if (device->subgroup_size_control) { Execution log as follows
|
Hi @rillomas, I ran this on Linux, from past reports I have already gathered that the Linux ANV driver is more unstable than the proprietary Windows driver. I can reproduce the crash with your diff like this: Crash log
It works if I disable multi_add using
Also, test-backend-ops fails in the test that was added in this PR: Can you run this with Environment:CPU: AMD EPYC 7302 Let me know if you need more info. |
@0cc4m Log output
|
The proper way to report this is directly in the Mesa issues or do you have a more direct connection to the driver team? |
I'm a Windows guy so don't have connections with the Linux driver team. I can check but probably better to first report to Mesa. |
Fuse adds that have the same shape, which are common in MoE models. It will currently fuse up to 6 adds, because we assume no more than 8 descriptors per dispatch. But this could be changed.