-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Fixed cooperative matrix shaders on Vulkan builds when CROSS_COMPILE ON #2914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @ggerganov , @slaren , Can one of you, please, take a look at this one when you have some time? Or any other approach to pass the |
@0cc4m @jeffbolznv @netrunnereve Can you take a look at this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm supportive of this change, just have a couple questions since I'm not a cmake expert.
ggml/src/ggml-vulkan/CMakeLists.txt
Outdated
../../include/ggml-vulkan.h | ||
) | ||
if (CMAKE_CROSSCOMPILING) | ||
# Make this configurable for cross builds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this part. Why does cross compiling skip the checks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before this change, it was not skipping this check, but it was not forwarding it to the vulkan-shaders-gen
as for CMAKE_CROSSCOMPILING it is starting it with ExternalProject_Add
so the compile definition added with add_compile_definitions(GGML_VULKAN_COOPMAT_GLSLC_SUPPORT)
was not forwarded to that external project (only added to the current project.
For not CMAKE_CROSSCOMPILING it was added with add_subdirectory
which maintained the compile definitions:
After my change, for CMAKE_CROSSCOMPILING
, we don't try to detect the version of the coopmap based on the current system but retrieving it from the options and pass it to the external project as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it work if you remove this condition? i.e. detect whether it's supported and pass it to the external project?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that should work:
- if we remove this condition
- detect in the main project if we have the COOPMAT and which version
- pass it to the
vulkan-shaders-gen
external project - in the CMakeLists for that one, we should still check the option and call
target_compile_definitions
The reason why I didn't use this approach was that I was thinking that it is clearer to not check the build host for this version when CROSS_COMPILING as we're not building for this system => but indeed, if this system won't have that specific version, we won't be able to build anyway.
I'm happy to simplify it like that if you think it would be better :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link to CI in Whisper.net after the change: https://github.com/sandrohanea/whisper.net/actions/runs/14087208133/job/39454313299
|
||
set(TARGET vulkan-shaders-gen) | ||
add_executable(${TARGET} vulkan-shaders-gen.cpp) | ||
if (GGML_VULKAN_COOPMAT_GLSLC_SUPPORT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In ggml-org/llama.cpp#11695 (comment) I had guessed that we may need to pass through a cmake variable telling this makefile which version of glslc to use? Is that not necessary? It's just the #defines that weren't making it through?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure if we need additional version to be used (and in my case it was working as same version as the host system was used). I can confirm that only providing the compile definitions is working for my build => https://github.com/sandrohanea/whisper.net/actions/runs/13977409227/job/39134498314
…SUPPORT and DGGML_VULKAN_COOPMAT2_GLSLC_SUPPORT based on detected values
ggml/src/ggml-vulkan/CMakeLists.txt
Outdated
ERROR_VARIABLE glslc_error) | ||
|
||
if (${glslc_error} MATCHES ".*extension not supported: GL_NV_cooperative_matrix2.*") | ||
message(STATUS "GL_NV_cooperative_matrix2 not supported by glslc") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be an "OFF" here like there is for coopmat1 above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
I got too fast on it, thanks for catching it!
It would not be consistent otherwise => won't cause issues as default will be OFF anyway, but I think it is better to have it explicit.
CI in Whisper.net for:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change looks good to me.
I'm still concerned about a case where the cross-compile picks up a different glslc in the host environment vs the target environment, and if those glslcs don't have the same level of support then we may still have the same problem. I think this would be most likely to happen for android builds, where the android NDK glslc is very old. If that happens, we could fix it by also passing the glslc executable through cmake variables. I'd also be OK with you doing that now, if you want. It should be OK to use a newer glslc, since the code it generates is backward compatible.
@bandoti is the most knowledgeable about cmake and cross compiling, I think. |
… specification for cross-compilation
Implemented this option to allow the args for different glslc : |
There are a couple of PRs related to this in the Llama codebase: Fix for the GGML_VULKAN_COOPMAT_GLSLC_SUPPORT: And I am working on adding cross-compile to CI here (also Llama): @sandrohanea Here is the Github CI build that does not require the Vulkan SDK on Linux, which should work with Ubuntu 24. This is possible by installing CC: @Icenowy |
ggml/src/ggml-vulkan/CMakeLists.txt
Outdated
find_package(Vulkan COMPONENTS glslc REQUIRED) | ||
|
||
# Allow explicitly specifying glslc executable for cross-compilation scenarios | ||
if(DEFINED GGML_VULKAN_GLSLC_EXECUTABLE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not convinced that providing a command-line override is the right solution here. When cross-compiling the standard behavior is to separate build/target utilities via the find-root path. For example, the line above is stating that it requires glslc, but it seems this should not in fact be required but found separately using a appropriate paths.
For example, what we are basically saying is:
- We need to find Vulkan as defined in the cross-compile toolchain.
- We need the build system glslc no matter what.
As such, it would be more consistent with general cross-compiling to instead use find_program(VULKAN_GLSLC_EXECUTABLE glslc REQUIRED NO_CMAKE_FIND_ROOT_PATH)
and change the Vulkan find command to find_package(Vulkan)
(not specifying the glslc component there).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about the Android case though, where there are multiple NDKs, and each might have its own glslc might be worthwhile to support both modes. So, the default behavior could be to use find_program, but also allow the command-line option to ensure we have our bases covered. There have been a couple issues surrounding this so probably best to play it safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so I just took a quick look in the FindVulkan.cmake
file on my system and quickly checked the docs. Based on the docs, the search for glslc will not be repeated if it is cached first:
This command is used to find a program. A cache entry, or a normal variable if NO_CACHE is specified, named by is created to store the result of this command. If the program is found the result is stored in the variable and the search will not be repeated unless the variable is cleared. If nothing is found, the result will be -NOTFOUND.
One possibility is to set the cache variable Vulkan_GLSLC_EXECUTABLE
before calling find_package(Vulkan COMPONENTS glslc REQUIRED)
. So, first running find_program(Vulkan_GLSLC_EXECUTABLE glslc REQUIRED NO_CMAKE_FIND_ROOT_PATH)
would override the search. Similarly, the command-line approach overriding the Vulkan_GLSLC_EXECUTABLE cache variable would override that, and then we don't need a separate variable for the same thing. Something to consider. 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about the Android case though, where there are multiple NDKs, and each might have its own glslc might be worthwhile to support both modes.
IMO we should just always use the glslc installed on the build system. The NDK glslc is egregiously out of date, it's really not what we want to be using.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the description of the PR, let me know what do you think
…andling and enhance cross-compilation support Detect what is possible but allow outside configuration (Vulkan_GLSLC_EXECUTABLE, GGML_VULKAN_COOPMAT_GLSLC_SUPPORT, GGML_VULKAN_COOPMAT2_GLSLC_SUPPORT)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After looking over this a bit more (see review notes) I think we should consider the simplicity of llama.cpp #12272, which directly addresses the need to fix only the coopmat shader generation. I think it is in our best interest to resolve that PR first (since it came first anyhow) and circle back to this one for handling the user-specified glslc.
Note: I applied a quick patch to the cross-build CI PR I have been working on to see if it can fix the issues. Had some sync issues with the CI image (unrelated to the change) so I might have to wait a bit before re-testing, but I will post a status update here once it completes.
cc: @Icenowy
../../include/ggml-vulkan.h | ||
) | ||
if (CMAKE_CROSSCOMPILING) | ||
# Check if user has explicitly set these options |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what is the need for these Android checks? Perhaps I confused matters mentioning Android, but I believe it was working correctly with the feature detection—unless I am missing something. Feature detection by attempting to compile the shaders should be able to work on all systems.
set(GLSLC_EXECUTABLE ${Vulkan_GLSLC_EXECUTABLE}) | ||
message(STATUS "Using glslc from parent CMake: ${GLSLC_EXECUTABLE}") | ||
# Also define this at compile time to set the default value in the shader generator | ||
add_compile_definitions(VULKAN_GLSLC_EXECUTABLE="${GLSLC_EXECUTABLE}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I don't think we need this functionality at all. The vulkan-shaders-gen binary has a command-line switch which receives a path to the specified glslc. There is really no need to provide a statically compiled path to glslc, since it is already being passed in at runtime.
// Default glslc path, can be overridden at compile time or runtime | ||
#ifdef VULKAN_GLSLC_EXECUTABLE | ||
// If VULKAN_GLSLC_EXECUTABLE is defined at compile time, use it | ||
#define DEFAULT_GLSLC_EXECUTABLE VULKAN_GLSLC_EXECUTABLE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See note above. I don't think there's any benefit to bake an executable path into the binary when it is already supplied as a command-line switch to vulkan-shaders-gen.
I see that ggml-org/llama.cpp#12272 was merged 🚀 Will close this PR and if different glslc versions are needed, we can create another one as a follow-up. Thank you all for your valuable inputs! |
Sounds good—please feel free to loop me in if that scenario comes up! I'm happy to help. |
Cross-Compilation System for GGML Vulkan
Key Components and Flow
glslc Executable Handling:
find_program(Vulkan_GLSLC_EXECUTABLE glslc NO_CMAKE_FIND_ROOT_PATH)
-DVulkan_GLSLC_EXECUTABLE=/path/to/glslc
)Cooperative Matrix Support Detection:
-DGGML_VULKAN_COOPMAT_GLSLC_SUPPORT=ON
-DGGML_VULKAN_COOPMAT2_GLSLC_SUPPORT=ON
Shader Compilation Process:
Android NDK Compatibility:
CI (in Whisper.net)
As whisper.cpp doesn't have some CI for Vulkan, I'm attaching my CI runs for whisper.net (only building, no tests yet :( ):
With Cross-Compile:
https://github.com/sandrohanea/whisper.net/actions/runs/14092352637/job/39471912246
Without Cross-Compile:
https://github.com/sandrohanea/whisper.net/actions/runs/14092371745/job/39471978181