Draft: fuck around with windows CI build #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

AsbjornOlling wants to merge 4,427 commits into master from fix-vulkan-build-failure-again-again

Owner

AsbjornOlling commented Jun 12, 2025

This PR only exists to make the github actions run.

AD2605 and others added 30 commits

May 15, 2025 17:39


          sycl: simplify bin_bcast_kernel (ggml-org#13383)

02cdd2d


          convert : fix conversion for llama 4 (ggml-org#13567)

c531edf


          gguf-py : fix disconnect-before-connect in editor-gui (ggml-org#13569)

07ad2b6

The bug caused a crash upon load with venvs created with
--system-site-packages to use
python3-pyside6.qtwidgets=python3-pyside6.qtwidgets=6.6.2-4
from Kubuntu 24.10.


          gguf : use ggml log system (ggml-org#13571)

c6a2c9e

* gguf : use ggml log system

* llama : remove unnecessary new lines in exception messages


          minja: sync (qwen3) (ggml-org#13573)

bc098c3

* minja: sync google/minja@f06140f

- google/minja#67 (@grf53)
- google/minja#66 (@taha-yassine)
- google/minja#63 (@grf53)
- google/minja#58

---------

Co-authored-by: ochafik <[email protected]>


          sycl : fixed compilation warnings (ggml-org#13582)

0a338ed


          ci : add ppc64el to build-linux-cross (ggml-org#13575)

7c07ac2


          llama : print hint when loading a model when no backends are loaded (g…

5364ae4

…gml-org#13589)


          metal : add FA-vec kernel for head size 64 (ggml-org#13583)

654a677

ggml-ci


          releases : use arm version of curl for arm releases (ggml-org#13592)

415e40a


          readme : add list of dependencies and their license (ggml-org#13591)

06c1e4a


          webui : improve accessibility for visually impaired people (ggml-org#…

aea9f8b

…13551)

* webui : improve accessibility for visually impaired people

* add a11y for extra contents

* fix some labels being read twice

* add skip to main content


          server : do not return error out of context (with ctx shift disabled) (…

6aa892e

…ggml-org#13577)


          llguidance : official v0.7.20 release (no actual changes) [noci] (ggm…

3e0be1c

…l-org#13594)


          vulkan: use scalar FA rather than coopmat2 when N==1 (ggml-org#13554)

4f41ee1


          vulkan: move common FA code to flash_attn_base.comp (ggml-org#13556)

2f5a4e1

* vulkan: move common FA code to flash_attn_base.comp

* vulkan: move common FA index/stride setup code to flash_attn_base.comp

* build fix


          parallel : add option for non-shared and larger prompts (ggml-org#13598)

518329b

* parallel : add option for non-shared and larger prompts

* parallel : update readme [no ci]

* cont : add note about base models [no ci]

* parallel : better var name

ggml-ci


          cmake: use the current build config for vulkan-shaders-gen (ggml-org#…

e3a7cf6

…13595)

* fix: use the current build config for `vulkan-shaders-gen`

* fix: only pass a valid build type to `--config`


          server : added --no-prefill-assistant flag (ggml-org#13608)

6a2bc8b

* added no-prefill-assistant flag

* reworded documentation comment

* updated server README.md


          CANN: Support MOE Model MUL_MAT_ID (ggml-org#13042)

33d7aed

Signed-off-by: noemotiovon <[email protected]>


          fix: check model pointer validity before use (ggml-org#13631)

9c55e5c


          ggml : Fix missing backtrace on Linux (ggml/1228)

60aea02

* Modern Linux defaults /proc/sys/kernel/yama/ptrace_scope to 1
* Fixed lldb attach
* Simplify by having the child do ggml_print_backtrace_symbols


          ggml : fix apple OS check in ggml_print_backtrace (ggml/1229)

8b5e19a


          mnist: fix segmentation fault (ggml/1227)

6c35981


          sync : ggml

d30cb5a

ggml-ci


          ci : upgraded oneAPI version in SYCL workflows and dockerfile (ggml-o…

f71f40a

…rg#13532)


          mtmd : add vision support for llama 4 (ggml-org#13282)

92ecdcc

* wip llama 4 conversion

* rm redundant __init__

* fix conversion

* fix conversion

* test impl

* try this

* reshape patch_embeddings_0

* fix view

* rm ffn_post_norm

* cgraph ok

* f32 for pos embd

* add image marker tokens

* Llama4UnfoldConvolution

* correct pixel shuffle

* fix merge conflicts

* correct

* add debug_graph

* logits matched, but it still preceives the image incorrectly

* fix style

* add image_grid_pinpoints

* handle llama 4 preprocessing

* rm load_image_size

* rm unused line

* fix

* small fix 2

* add test & docs

* fix llava-1.6 test

* test: add notion of huge models

* add comment

* add warn about degraded quality


          sycl : backend documentation review (ggml-org#13544)

725f23f

* sycl: reviewing and updating docs

* Updates Runtime error codes

* Improves OOM troubleshooting entry

* Added a llama 3 sample

* Updated supported models

* Updated releases table


          Vulkan: Add f32 accumulator support to quantized mul mat to fix GLM4 …

8960efd

…32B incoherence (ggml-org#13607)


          common : add load_progress_callback (ggml-org#13617)

1dfbf2c

ggerganov and others added 29 commits

June 9, 2025 23:05


          metal : use less stack memory in FA kernel (ggml-org#14088)

1f63e75

* metal : use less stack memory in FA kernel

ggml-ci

* cont : fix BF16 variant


          Add in-build ggml::ggml ALIAS library (ggml/1260)

1a3b5e8

Enable uniform linking with subproject and with find_package.


          sync : ggml

b8e2194

ggml-ci


          rpc : nicer error messages for RPC server crash (ggml-org#14076)

2bb0467


          vulkan : fix build failure caused by vulkan-shaders-gen install

cb3bf57


          Vulkan: Don't default to CPU device (like llvmpipe), even if no other…

97340b4

… device is available, to allow fallback to CPU backend (ggml-org#14099)


          ggml : fix weak alias win32 (whisper/0)

b7ce1ad

ggml-ci


          sync : ggml

ae92c18

ggml-ci


          Fixed spec timings to: accepted/tested instead of accepted/drafted (g…

3a12db2

…gml-org#14104)


          vulkan: force device 0 in CI (ggml-org#14106)

652b70e


          llama : support GEGLU for jina-bert-v2 (ggml-org#14090)

3678b83


          convert : fix duplicate key DeepSeek-R1 conversion error (ggml-org#14103

55f6b9f


          kv-cache : avoid modifying recurrent cells when setting inputs (ggml-…

dad5c44

…org#13834)

* kv-cache : avoid modifying recurrent cells when setting inputs

* kv-cache : remove inp_s_mask

It was replaced with equivalent and simpler functionality
with rs_z (the first zeroed state) and the already-existing inp_s_copy.

* kv-cache : fix non-consecutive token pos warning for recurrent models

The problem was apparently caused by how the tail cells were swapped.

* graph : simplify logic for recurrent state copies

* kv-cache : use cell without src refs for rs_z in recurrent cache

* llama-graph : fix recurrent state copy

The `state_copy` shuffle assumes everything is moved at once,
which is not true when `states_extra` is copied back to the cache
before copying the range of states between `head` and `head + n_seqs`.
This is only a problem if any of the cells in [`head`, `head + n_seqs`)
have an `src` in [`head + n_seqs`, `head + n_kv`),
which does happen when `n_ubatch > 1` in the `llama-parallel` example.

Changing the order of the operations avoids the potential overwrite
before use, although when copies are avoided (like with Mamba2),
this will require further changes.

* llama-graph : rename n_state to state_size in build_recurrent_state

This naming should reduce confusion between the state size
and the number of states.


          opencl: add mul_mv_id_q4_0_f32_8x_flat (ggml-org#14003)

4c763c8


          vulkan: Track descriptor pools/sets per-context (ggml-org#14109)

1f7d50b

Use the same descriptor set layout for all pipelines (MAX_PARAMETER_COUNT == 8)
and move it to the vk_device. Move all the descriptor pool and set tracking to
the context - none of it is specific to pipelines anymore. It has a single vector
of pools and vector of sets, and a single counter to track requests and a single
counter to track use.


          kv-cache : add LLAMA_KV_CACHE_DEBUG environment variable (ggml-org#14121

7ae2932


          server : pass default --keep argument (ggml-org#14120)

2baf077


          kv-cache : relax SWA masking condition (ggml-org#14119)

89a184f

ggml-ci


          webui: Wrap long numbers instead of infinite horizontal scroll (ggml-…

7781e5f

…org#14062)

* webui: Wrap long numbers instead of infinite horizontal scroll

* Use tailwind class

* update index.html.gz


          vulkan: Better thread-safety for command pools/buffers (ggml-org#14116)

bd248d4

This change moves the command pool/buffer tracking into a vk_command_pool
structure. There are two instances per context (for compute+transfer) and
two instances per device for operations that don't go through a context.
This should prevent separate contexts from stomping on each other.


          tests : add test-tokenizers-repo (ggml-org#14017)

cc66a7f


          chore : clean up relative source dir paths (ggml-org#14128)

d4e0d95


          Implement GGML_CPU_ALL_VARIANTS for ARM (ggml-org#14080)

532802f

* ggml-cpu: Factor out feature detection build from x86

* ggml-cpu: Add ARM feature detection and scoring

This is analogous to cpu-feats-x86.cpp. However, to detect compile-time
activation of features, we rely on GGML_USE_<FEAT> which need to be set
in cmake, instead of GGML_<FEAT> that users would set for x86.

This is because on ARM, users specify features with GGML_CPU_ARM_ARCH,
rather than with individual flags.

* ggml-cpu: Implement GGML_CPU_ALL_VARIANTS for ARM

Like x86, however to pass around arch flags within cmake, we use
GGML_INTERNAL_<FEAT> as we don't have GGML_<FEAT>.

Some features are optional, so we may need to build multiple backends
per arch version (armv8.2_1, armv8.2_2, ...), and let the scoring
function sort out which one can be used.

* ggml-cpu: Limit ARM GGML_CPU_ALL_VARIANTS to Linux for now

The other platforms will need their own specific variants.

This also fixes the bug that the the variant-building branch was always
being executed as the else-branch of GGML_NATIVE=OFF. The branch is
moved to an elseif-branch which restores the previous behavior.


          common: fix issue with regex_escape routine on windows (ggml-org#14133)

2e89f76


          context : round n_tokens to next multiple of n_seqs when reserving (g…

a20b2b0

…gml-org#14140)

This fixes RWKV inference which otherwise failed
when the worst case ubatch.n_seq_tokens rounded to 0.


          kv-cache : fix split_equal handling in unified implementation (ggml-o…

…rg#14130)

ggml-ci


          cmake : handle whitepsaces in path during metal build (ggml-org#14126)

e2c0b6e

* cmake : handle whitepsaces in path during metal build

ggml-ci

* cont : proper fix

ggml-ci

---------

Co-authored-by: Daniel Bevenius <[email protected]>


          Merge branch 'master' into fix-vulkan-build-failure-again

82bea12


          vulkan : try to fix windows build by branching on cmake generators

c2e9de3

AsbjornOlling force-pushed the fix-vulkan-build-failure-again-again branch from 1f1bf7b to c2e9de3 Compare

June 12, 2025 10:15

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

70 participants