Skip to content

Conversation

@jan-service-account
Copy link

Updates dev branch with latest release (b6475) from ggml-org/llama.cpp

NeoZhangJianyu and others added 24 commits September 12, 2025 09:15
…" (ggml-org#15910)

* Revert "sycl: add usage of enqueue_functions extension (ggml-org#14244)"

This reverts commit 8308f98.

* fix missed revert code, format the code
…g#15947)

* vulkan: implement ggml igpu device type, implement pci id support

* fix compiler warning

* prevent printf overflow warning
…g#15948)

The function 'output_reserve' return type is 'uint32_t', so need to add
explicit casting.
…15790)

To pull and run models via: llama-server -dr gemma3
Add some validators and sanitizers for Docker Model urls and metadata

Signed-off-by: Eric Curtin <[email protected]>
* metal : run graphs ops concurrently

ggml-ci

* cont : add flags for debugging and disabling concurrency

ggml-ci

* cont : refactor and handle fusing

ggml-ci

* cont : simplify - no need to use GPU address

ggml-ci

* cont : prepare mem ranges for reuse + add ggml-metal-common.cpp

ggml-ci

* cont : avoid redundant keywords in cpp [no ci]

* metal : reorder graph for better concurrency

ggml-ci

* metal : fix race on mem pool buffers

ggml-ci

* cont : add env GGML_METAL_GRAPH_OPTIMIZE_DISABLE

ggml-ci

* cont : refactor, optimize, add comments

ggml-ci

* cont : refactor ggml-metal.m

ggml-ci

* minor : update logs [no ci]
* metal : refactor bin kernels loading

ggml-ci

* metal : refactor rms kernel loading

ggml-ci

* ci : try to add memory leaks check

ggml-ci

* ci : try to enable memory leak detection for Mac

* cont : seems to be working
* llama : allow using iGPUs with --device

* mtmd : allow iGPU

* rpc-server : allow iGPU
…ers (ggml-org#15705)

Use this to query register count for shader compiles on NVIDIA. Currently
this is only for performance debug, but it could eventually be used in some
heuristics like split_k.
* vulkan: fix failing dequant shaders

* add missing const
* ggml-zdnn: rm user mapped buffers

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: rm dead code

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: attempt to fix missing extra data buffer free

Signed-off-by: Aaron Teo <[email protected]>

---------

Signed-off-by: Aaron Teo <[email protected]>
* doc : update documentation for --tensor-split

* Update tools/main/README.md

Co-authored-by: Johannes Gäßler <[email protected]>

* Update tools/main/README.md

Co-authored-by: Diego Devesa <[email protected]>

---------

Co-authored-by: Johannes Gäßler <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>
* releases : update ROCM, add gfx1200, gfx1201, gfx1151

* releases : set target to 13.3 for macos-x64

* add hipblaslt.dll to release

* add hipblaslt/library to release
* metal : fix kernel requirements

ggml-ci

* cont : fix supports_op

* cont : fix supports_op for ARGMAX
)

* build: fix the cache keys for Windows HIP release job

Update the cache keys to include the HIP SDK version, preventing the
use of outdated ROCm installation caches.

* build: sync changes from release.yml to build.yml

- Update HIP SDK version to 25.Q3 and ROCm version to 6.4.2
- Update the cache keys to reflect the new versions

* build: remove Windows HIP release for gfx1151
since the current stable rocWMMA does not support gfx1151.
* vulkan: move mul_mm dequantization steps into a separate file and functions

* improve mul_mm vector load code

* fix debug mode issues and warnings
…adeon RX 9000 series (ggml-org#15994)

* rocm.Dockerfile: added gfx1200,gfx1201 architectures to support  AMD Radeon RX 9000 series

https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.4.1/reference/system-requirements.html#rdna-os
states the Radeon RX 9000 series is supported support from Ubuntu 24.04.2, and the dockerfile is using 24.04 which is ROCm 6.4.

This fixed the `ROCm error: invalid device function` I was getting when trying to use the rocm container.
* metal : remove mem pool usage

ggml-ci

* metal : remove mem pool implementation

ggml-ci

* metal : take into account the actual allocated memory of the tensor

ggml-ci

* cont : use ggml_backend_buft_get_alloc_size

ggml-ci

* cont : improve, comments

ggml-ci

* cont : add functions for the extra tensor sizes

* metal : add comments

ggml-ci

* metal : implement .get_alloc_size for the rest of the buffer types

ggml-ci

* metal : remove ggml_metal_heap

ggml-ci
* add grok-2 support

* type fix

* type fix

* type fix

* "fix" vocab for invalid sequences

* fix expert tensor mapping and spaces in vocab

* add chat template

* fix norm tensor mapping

* rename layer_out_norm to ffn_post_norm

* ensure ffn_post_norm is mapped

* fix experts merging

* remove erroneous FFN_GATE entry

* concatenate split tensors and add more metadata

* process all expert layers and try cat instead of hstack

* add support for community BPE vocab

* fix expert feed forward length and ffn_down concat

* commit this too

* add ffn_up/gate/down, unsure if sequence is right

* add ffn_gate/down/up to tensor names

* correct residual moe (still not working)

* mess--

* fix embedding scale being applied twice

* add built in chat template

* change beta fast for grok if default value

* remove spm vocab in favor of community bpe vocab

* change attention temp length metadata type to integer

* update attention temp length metadata

* remove comment

* replace M_SQRT2 with std::sqrt(2)

* add yarn metadata, move defaults to hparams
@jan-service-account jan-service-account merged commit f453acf into dev Sep 15, 2025
3 checks passed
@jan-service-account jan-service-account deleted the update-dev-from-master-2025-09-15-00-36 branch September 15, 2025 00:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.