Skip to content

Conversation

@jan-service-account
Copy link

Updates dev branch with latest release (b6062) from ggml-org/llama.cpp

lhez and others added 6 commits August 1, 2025 13:15
* support hunyuan_v1_dense

Signed-off-by: stevenkuang <[email protected]>

* update hunyuan_moe to hunyuan_v1_moe

Signed-off-by: stevenkuang <[email protected]>

* fix rope alpha assert and bos token

Signed-off-by: stevenkuang <[email protected]>

* add blank line

Signed-off-by: stevenkuang <[email protected]>

* Revert "update hunyuan_moe to hunyuan_v1_moe"

This reverts commit aa973ca.

* use hunyuan_dense instead of hunyuan_v1_dense

Signed-off-by: stevenkuang <[email protected]>

* fix hunyuan_moe chat template

Signed-off-by: stevenkuang <[email protected]>

* remove leftover code

Signed-off-by: stevenkuang <[email protected]>

* update hunyuan dense chat template

Signed-off-by: stevenkuang <[email protected]>

* fix hunyuan dense vocab and chat template

Signed-off-by: stevenkuang <[email protected]>

---------

Signed-off-by: stevenkuang <[email protected]>
* vendor : update vendored copy of google/minja

Signed-off-by: Lennart Austenfeld <[email protected]>

* Re-remove trailing whitespace

Signed-off-by: Lennart Austenfeld <[email protected]>

* Remove another trailing whitespace

Signed-off-by: Lennart Austenfeld <[email protected]>

---------

Signed-off-by: Lennart Austenfeld <[email protected]>
* vulkan: optimizations for direct convolution

- Empirically choose a better tile size. Reducing BS_K/BS_NPQ helps fill
  the GPU. The new size should be amenable to using coopmat, too.
- Fix shmem bank conflicts. 16B padding should work with coopmat.
- Some explicit loop unrolling.
- Skip math/stores work for parts of the tile that are OOB.
- Apply fastdiv opt.
- Disable shuffles for NV.

* Three tiles sizes for CONV_2D, and a heuristic to choose

* reallow collectives for pre-Turing

* make SHMEM_PAD a spec constant

* fixes for intel perf - no shmem padding, placeholder shader core count

* shader variants with/without unrolling

* 0cc4m's fixes for AMD perf

Co-authored-by: 0cc4m <[email protected]>

---------

Co-authored-by: 0cc4m <[email protected]>
@jan-service-account jan-service-account merged commit 1749cf1 into dev Aug 2, 2025
12 checks passed
@jan-service-account jan-service-account deleted the update-dev-from-master-2025-08-02-09-09 branch August 2, 2025 09:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants