Add Windows Build Workflow (GitHub Actions) #976

Thireus · 2025-11-17T15:00:23Z

First, thank you for maintaining this project — it has been very useful, and I appreciate the work that has gone into it.

I initially created a fork to add automated Windows builds for my own use, since I needed ready-to-use binaries. Since this is functionality that could benefit other users as well, I’m submitting this pull request so the Windows build workflow can live directly in the main repository instead of in my fork.

This PR includes:

GitHub Actions workflow for building the project on Windows

Automatic artifact uploads so users can download ready-to-use Windows builds

Other small tweaks to ensure the code compiles on Windows.

Builds must be manually triggered (I believe they could be automated after each commit, but I have not had the chance to dig into it). There is also a bit of cleanup to do, specifically regarding other automated jobs that run to conduct meaningless checks. The original code was obtained from mainline with some tweaking to adapt it to ik_llama.cpp.

My goal was to make the project more accessible to Windows users, specifically users who do not find the time to set up a dev env or don't have the knowledge to do it on Windows, such as myself. If you’d prefer changes to structure, naming, workflow triggers, or anything else, I’m happy to adjust the PR accordingly.

Thanks again for the project and for taking the time to review this!

-DGGML_MAX_CONTEXTS=2048

ikawrakow#611

-DGGML_MAX_CONTEXTS=2048

Add GGML_MAX_CONTEXTS def

add_compile_definitions for GGML_MAX_CONTEXTS

Default GGML_MAX_CONTEXTS to 2048

Only bump maxstdio if GGML_MAX_CONTEXTS > 512

ikawrakow · 2025-11-17T15:40:57Z

examples/quantize-stats/quantize-stats.cpp

    };
    auto compute_1row = [&] (const float * xr) {
-        float weight[kBlockSize];
+        std::vector<float> weight(kBlockSize); // float weight[kBlockSize]; - Fix for error C2131: expression did not evaluate to a constant


This code has been disabled in #707

ikawrakow · 2025-11-17T15:44:10Z

ggml/src/CMakeLists.txt

        if (GGML_AVX512)
            list(APPEND ARCH_FLAGS -mavx512f)
            list(APPEND ARCH_FLAGS -mavx512bw)
+            list(APPEND ARCH_FLAGS -mavx512dq)


This does not actually enable the fast AVX512 matrix multiplication version. The fast version requires -mavx512vnni in addition to these.

Got it, which is the one below, which produces the avx512_vnni build variant:

if (GGML_AVX512_VNNI) list(APPEND ARCH_FLAGS -mavx512vnni) endif()

Is there any advantage running avx512 without vnni over axv2?

Very little, if any. If VNNI is not enabled, the matrix multiplication implementation will use the AVX2 version. There will be some portions in flash attention that will get processed with 512-bit registers/instructions, but the main cost (the K*Q matrix multiplication) will be still AVX2.

I haven't done a more fine-grained implementation of matrix multiplications because my CPU (Ryzen-7950X, so Zen4 core) uses "double-pumping", i.e., 512-bit instructions get executed as two 256-bit instructions, so the only time you gain something is when you can use AVX512 instructions that are not there (as 256-bit equivalents) in AVX2. The most important such instruction is _mm512_dpbusd_epi32, which is part of the VNNI extension. Without that instruction, using 512-bit registers/instructions brings zero benefit on Zen4 cores. Things are different on Zen5 and some of the high-end Intel CPUs, but without regular access to such CPUs there is no meaningful way to develop more fine-grained versions of the matrix multiplication kernels.

ikawrakow · 2025-11-17T15:44:48Z

ggml/src/ggml-common.h

+inline __m512i operator^(const __m512i& a, const __m512i& b) { return _mm512_xor_si512(a, b); }
+
+// AVX2 integer bitwise operators
+inline __m256i operator|(const __m256i& a, const __m256i& b) { return _mm256_or_si256(a, b); }


Not necessary.

ikawrakow · 2025-11-17T15:46:21Z

ggml/src/ggml-common.h

+inline __m256i operator^(const __m256i& a, const __m256i& b) { return _mm256_xor_si256(a, b); }
+
+// NEON
+#ifdef __ARM_NEON


Not necessary. The |,&,^ operators ended up by mistake in the AVX512 version, but are not used in any of the other implementations.

ikawrakow · 2025-11-17T15:59:42Z

So, I specifically threw out all of llama.cpp CI and GitHub actions. And now we will have them all back, almost all failing?

Thireus · 2025-11-17T16:18:55Z

Thank you for the comments.

If the goal is wider adoption then removing CI and builds works against it. The builds I produce have a few dozens of users (and other projects, e.g. Jan: janhq/jan#6917, are actively asking for ready-to-download ik_llama.cpp artifacts).

The demand is there. But without CI and artifact generation, only power users can realistically adopt ik_llama.cpp which is unfortunate, since users with lower-end consumer hardware - the ones who benefit most from ik_llama.cpp’s optimisations - are in my opinion the least likely to build from source. Same reasoning applies I suppose to why users keep sticking to Windows OS while clearly it's not the best OS to run LLMs.

From my perspective, facilitating integration with other frameworks and lowering the barrier to use will naturally boost adoption.

ikawrakow · 2025-11-17T16:34:28Z

I can see the benefit of producing ready builds. But if so, then only for the platforms that are actually supported. To me it looks like this has been copied over from llama.cpp, so there is SYCL, RiscV, and the kitchen sink.

llama.cpp has put way more effort into making it work on as many platforms as possible. Their back-ends are loaded dynamically, which allows to provide build artifacts more easily. None of this is true here. I would have thought that you have done a few Windows configurations (CPU-only, CPU+CUDA, etc.), and that was that. A lot of the llama.cpp stuff just does not apply here anymore.

Thireus · 2025-11-17T16:43:04Z

I did what I could with the knowledge and resources I had. I had to start from something that worked and hacked my way into making it function for ik_llama.cpp. I acknowledge that this CI is definitely “dirty” code and much of it could be thrown away.

I can’t afford to spend the amount of time required for full DevOps and cross-platform support right now. That’s why everything except Windows CUDA is disabled. I tried multiple times to build without CUDA and for other OSs, but I couldn’t get it to work.

This setup certainly requires cleaning and isn’t ready to merge if the goal is fully clean, production-quality code. It was intended more as a starting point.

Kreatifchk · 2025-11-24T19:18:05Z

I did what I could with the knowledge and resources I had. I had to start from something that worked and hacked my way into making it function for ik_llama.cpp. I acknowledge that this CI is definitely “dirty” code and much of it could be thrown away.

I can’t afford to spend the amount of time required for full DevOps and cross-platform support right now. That’s why everything except Windows CUDA is disabled. I tried multiple times to build without CUDA and for other OSs, but I couldn’t get it to work.

This setup certainly requires cleaning and isn’t ready to merge if the goal is fully clean, production-quality code. It was intended more as a starting point.

do you have a CPU only version, I didn’t find it?

Thireus · 2025-11-24T19:21:29Z

Unfortunately not, I tried that a while ago and spent countless hours on it but it fails to build without Cuda.

Thireus added 30 commits July 15, 2025 00:15

Merge branch 'ikawrakow:main' into main

d9a21ee

Merge branch 'ikawrakow:main' into main

9128fc4

Fix C2131: expression did not evaluate

8c2a6ee

Merge branch 'ikawrakow:main' into main

1b097c7

Update build.yml

aa0aa02

-DGGML_MAX_CONTEXTS=2048

Update ggml.h

a60805a

ikawrakow#611

Merge branch 'ikawrakow:main' into main

edaba8e

Update release.yml

09d7fac

-DGGML_MAX_CONTEXTS=2048

Revert changes

a47f31f

Update CMakeLists.txt

4fbac8e

Add GGML_MAX_CONTEXTS def

Update CMakeLists.txt

5b1b204

add_compile_definitions for GGML_MAX_CONTEXTS

Update ggml.h

699dc2e

Default GGML_MAX_CONTEXTS to 2048

Update llama.cpp

4c81b88

Only bump maxstdio if GGML_MAX_CONTEXTS > 512

Update CMakeLists.txt

118dd3e

Update CMakeLists.txt

5237653

Update CMakeLists.txt

2b7eaa0

Update llama.cpp

27125b1

Merge branch 'ikawrakow:main' into main

378986d

Merge branch 'ikawrakow:main' into main

b407232

Merge branch 'ikawrakow:main' into main

ca20df1

Merge branch 'ikawrakow:main' into main

87fd730

Merge branch 'ikawrakow:main' into main

c90e8a1

Merge branch 'ikawrakow:main' into main

758a987

Merge branch 'ikawrakow:main' into main

607e01e

Merge branch 'ikawrakow:main' into main

642c2b0

Merge branch 'ikawrakow:main' into main

efab478

Merge branch 'ikawrakow:main' into main

bb4c917

Update convert_hf_to_gguf.py

2e53b96

Update constants.py

89ebd61

Update llama.cpp

f619be7

Merge branch 'ikawrakow:main' into main

7723dec

Thireus mentioned this pull request Nov 17, 2025

Bug: CUDA illegal memory access loading specific iq1_kt tensors (Qwen3-VL Instruct, ik_llama.cpp / ggml-cuda) #974

Closed

ikawrakow reviewed Nov 17, 2025

View reviewed changes

Thireus marked this pull request as draft November 17, 2025 16:41

Thireus added 9 commits November 18, 2025 10:12

Merge branch 'ikawrakow:main' into main

d97ac64

Merge branch 'ikawrakow:main' into main

3c54a45

Merge branch 'ikawrakow:main' into main

9ded0c4

Merge branch 'ikawrakow:main' into main

5882232

Merge branch 'ikawrakow:main' into main

ffa577e

Merge branch 'ikawrakow:main' into main

b2ba459

Merge branch 'ikawrakow:main' into main

897a2c1

Merge branch 'ikawrakow:main' into main

982b93c

Merge branch 'ikawrakow:main' into main

409ecf9

Thireus added 8 commits November 26, 2025 09:52

Merge branch 'ikawrakow:main' into main

857b4f6

Merge branch 'ikawrakow:main' into main

1a21026

Merge branch 'ikawrakow:main' into main

7ba8b54

Merge branch 'ikawrakow:main' into main

14e10f5

Merge branch 'ikawrakow:main' into main

da8b3a5

Merge branch 'ikawrakow:main' into main

38b1232

Merge branch 'ikawrakow:main' into main

f54d377

Merge branch 'ikawrakow:main' into main

d70a752

Add Windows Build Workflow (GitHub Actions) #976

Are you sure you want to change the base?

Add Windows Build Workflow (GitHub Actions) #976

Uh oh!

Conversation

Thireus commented Nov 17, 2025

Uh oh!

ikawrakow Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

ikawrakow Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

Thireus Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

ikawrakow Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ikawrakow Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

ikawrakow Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

ikawrakow commented Nov 17, 2025

Uh oh!

Thireus commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ikawrakow commented Nov 17, 2025

Uh oh!

Thireus commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kreatifchk commented Nov 24, 2025

Uh oh!

Thireus commented Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ikawrakow Nov 18, 2025 •

edited

Loading

Thireus commented Nov 17, 2025 •

edited

Loading

Thireus commented Nov 17, 2025 •

edited

Loading