-
Notifications
You must be signed in to change notification settings - Fork 13.4k
vulkan: 64-bit im2col #16135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vulkan: 64-bit im2col #16135
Conversation
This is true for allocations, but not for buffers. If I disable the allocation size check in ggml_vk_create_buffer your new test kinda runs on all my devices, but I'm not sure if it runs correctly. Validation layers complain about the buffer size and the descriptor range, of course. I tried running your new im2col and im2col_3d tests: On AMD (RADV) it does the allocation, but fails the test runs. On all three it takes very long to finish the test. The tests also used huge amounts of RAM (>80GB), not sure if that's the CPU backend or something else. |
0ed0dcc to
e926474
Compare
|
I've pushed a fix for the descriptor range validation failure. I'm not aware of one related to the buffer size. The large memory usage and slowness is expected. The test framework ends up with multiple copies of the huge tensor, converted to f32. I don't intend to enable these tests by default. I'm surprised the AMD driver is failing. Maybe it could have been related to the validation failure, but that's a bit surprising since that descriptor isn't actually used. |
|
Mesa driver development seems to work by building stuff, optimizing it and fixing issues when they come up. If things don't come up, they often don't work, so this is probably another case of "nobody tried to do this yet". We'll have to open an issue about it, most likely. |
I mean this: It does not happen on Nvidia, because Nvidia reports a very large maxBufferSize, while AMD and Intel do not. Besides that, how do you plan to handle the allocation size check in |
|
OK, that explains why I didn't see it on NVIDIA. I don't know how to get around that on other implementations. Maybe they can eventually relax the limit in their drivers.
Based on all this, maybe I should change it to check maxBufferSize. I've been planning to do it in a separate change. |
Add variants of the im2col shaders that use buffer_device_address/buffer_reference, and use 64-bit address calculations. This is needed for large convolutions used in stable-diffusion.cpp.
e926474 to
218f2d7
Compare
|
Rebased, to hopefully resolve old CI failures. |
|
@jeffbolznv I'm looking into using buffer_reference to reduce the integer dot mmq shader shared memory size. As a first basic test I did this: It works on Intel and Nvidia, but completely crashes AMD RADV to the point that it automatically reboots the entire server, so something is very wrong. Can you tell me if that's correct use? If so, I need to open an issue with Mesa. |
|
buffer_reference types always point to buffer memory, so I can't quite tell what this snippet is supposed to do. It looks like it declares a pointer to buffer memory in shared memory. If what you're trying to do is reuse the same shared memory bytes for different parts of the shader, e.g. make coopmat_stage use the same memory as buf_a_qs/buf_b_qs, then https://github.com/KhronosGroup/GLSL/blob/main/extensions/ext/GL_EXT_shared_memory_block.txt is the extension you want (warning: the spec text is not very helpful). |
|
Oh, alright. I was trying to get pointer casting for shared memory, basically, to get some more flexibility with buffering. I need to spend more time trying to understand these extensions first, they are quite hard to grasp and I can find barely any examples. |
* vulkan: 64-bit im2col Add variants of the im2col shaders that use buffer_device_address/buffer_reference, and use 64-bit address calculations. This is needed for large convolutions used in stable-diffusion.cpp. * fix validation error for large im2col
* vulkan: 64-bit im2col Add variants of the im2col shaders that use buffer_device_address/buffer_reference, and use 64-bit address calculations. This is needed for large convolutions used in stable-diffusion.cpp. * fix validation error for large im2col
Add variants of the im2col shaders that use buffer_device_address/buffer_reference, and use 64-bit address calculations. This is needed for large convolutions used in stable-diffusion.cpp.
I've been working on getting leejet/stable-diffusion.cpp#778 to work in Vulkan. The main thing that's missing is that it does 2d and 3d convolutions that have intermediate im2col buffers that are larger than 4GB. This change fixes the im2col part, I'll make a separate change for the matmul part.
Memory allocations larger than maxMemoryAllocationSize are not technically forbidden, and at least NVIDIA's windows driver will allocate more than 4GB.