-
Notifications
You must be signed in to change notification settings - Fork 163
Add Windows Build Workflow (GitHub Actions) #976
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
-DGGML_MAX_CONTEXTS=2048
-DGGML_MAX_CONTEXTS=2048
Add GGML_MAX_CONTEXTS def
add_compile_definitions for GGML_MAX_CONTEXTS
Default GGML_MAX_CONTEXTS to 2048
Only bump maxstdio if GGML_MAX_CONTEXTS > 512
| }; | ||
| auto compute_1row = [&] (const float * xr) { | ||
| float weight[kBlockSize]; | ||
| std::vector<float> weight(kBlockSize); // float weight[kBlockSize]; - Fix for error C2131: expression did not evaluate to a constant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code has been disabled in #707
| if (GGML_AVX512) | ||
| list(APPEND ARCH_FLAGS -mavx512f) | ||
| list(APPEND ARCH_FLAGS -mavx512bw) | ||
| list(APPEND ARCH_FLAGS -mavx512dq) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not actually enable the fast AVX512 matrix multiplication version. The fast version requires -mavx512vnni in addition to these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, which is the one below, which produces the avx512_vnni build variant:
if (GGML_AVX512_VNNI)
list(APPEND ARCH_FLAGS -mavx512vnni)
endif()
Is there any advantage running avx512 without vnni over axv2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very little, if any. If VNNI is not enabled, the matrix multiplication implementation will use the AVX2 version. There will be some portions in flash attention that will get processed with 512-bit registers/instructions, but the main cost (the K*Q matrix multiplication) will be still AVX2.
I haven't done a more fine-grained implementation of matrix multiplications because my CPU (Ryzen-7950X, so Zen4 core) uses "double-pumping", i.e., 512-bit instructions get executed as two 256-bit instructions, so the only time you gain something is when you can use AVX512 instructions that are not there (as 256-bit equivalents) in AVX2. The most important such instruction is _mm512_dpbusd_epi32, which is part of the VNNI extension. Without that instruction, using 512-bit registers/instructions brings zero benefit on Zen4 cores. Things are different on Zen5 and some of the high-end Intel CPUs, but without regular access to such CPUs there is no meaningful way to develop more fine-grained versions of the matrix multiplication kernels.
| inline __m512i operator^(const __m512i& a, const __m512i& b) { return _mm512_xor_si512(a, b); } | ||
|
|
||
| // AVX2 integer bitwise operators | ||
| inline __m256i operator|(const __m256i& a, const __m256i& b) { return _mm256_or_si256(a, b); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessary.
| inline __m256i operator^(const __m256i& a, const __m256i& b) { return _mm256_xor_si256(a, b); } | ||
|
|
||
| // NEON | ||
| #ifdef __ARM_NEON |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessary. The |,&,^ operators ended up by mistake in the AVX512 version, but are not used in any of the other implementations.
|
So, I specifically threw out all of |
|
Thank you for the comments. If the goal is wider adoption then removing CI and builds works against it. The builds I produce have a few dozens of users (and other projects, e.g. Jan: janhq/jan#6917, are actively asking for ready-to-download ik_llama.cpp artifacts). The demand is there. But without CI and artifact generation, only power users can realistically adopt ik_llama.cpp which is unfortunate, since users with lower-end consumer hardware - the ones who benefit most from ik_llama.cpp’s optimisations - are in my opinion the least likely to build from source. Same reasoning applies I suppose to why users keep sticking to Windows OS while clearly it's not the best OS to run LLMs. From my perspective, facilitating integration with other frameworks and lowering the barrier to use will naturally boost adoption. |
|
I can see the benefit of producing ready builds. But if so, then only for the platforms that are actually supported. To me it looks like this has been copied over from
|
|
I did what I could with the knowledge and resources I had. I had to start from something that worked and hacked my way into making it function for ik_llama.cpp. I acknowledge that this CI is definitely “dirty” code and much of it could be thrown away. I can’t afford to spend the amount of time required for full DevOps and cross-platform support right now. That’s why everything except Windows CUDA is disabled. I tried multiple times to build without CUDA and for other OSs, but I couldn’t get it to work. This setup certainly requires cleaning and isn’t ready to merge if the goal is fully clean, production-quality code. It was intended more as a starting point. |
do you have a CPU only version, I didn’t find it? |
|
Unfortunately not, I tried that a while ago and spent countless hours on it but it fails to build without Cuda. |
First, thank you for maintaining this project — it has been very useful, and I appreciate the work that has gone into it.
I initially created a fork to add automated Windows builds for my own use, since I needed ready-to-use binaries. Since this is functionality that could benefit other users as well, I’m submitting this pull request so the Windows build workflow can live directly in the main repository instead of in my fork.
This PR includes:
Builds must be manually triggered (I believe they could be automated after each commit, but I have not had the chance to dig into it). There is also a bit of cleanup to do, specifically regarding other automated jobs that run to conduct meaningless checks. The original code was obtained from mainline with some tweaking to adapt it to ik_llama.cpp.
My goal was to make the project more accessible to Windows users, specifically users who do not find the time to set up a dev env or don't have the knowledge to do it on Windows, such as myself. If you’d prefer changes to structure, naming, workflow triggers, or anything else, I’m happy to adjust the PR accordingly.
Thanks again for the project and for taking the time to review this!