forked from ggml-org/llama.cpp
-
Notifications
You must be signed in to change notification settings - Fork 3
Sync master with upstream release b6475 #250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
jan-service-account
merged 24 commits into
dev
from
update-dev-from-master-2025-09-15-00-36
Sep 15, 2025
Merged
Sync master with upstream release b6475 #250
jan-service-account
merged 24 commits into
dev
from
update-dev-from-master-2025-09-15-00-36
Sep 15, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…" (ggml-org#15910) * Revert "sycl: add usage of enqueue_functions extension (ggml-org#14244)" This reverts commit 8308f98. * fix missed revert code, format the code
…g#15947) * vulkan: implement ggml igpu device type, implement pci id support * fix compiler warning * prevent printf overflow warning
…g#15948) The function 'output_reserve' return type is 'uint32_t', so need to add explicit casting.
…15790) To pull and run models via: llama-server -dr gemma3 Add some validators and sanitizers for Docker Model urls and metadata Signed-off-by: Eric Curtin <[email protected]>
…d incorrect zTensor free (ggml-org#15839)
* metal : run graphs ops concurrently ggml-ci * cont : add flags for debugging and disabling concurrency ggml-ci * cont : refactor and handle fusing ggml-ci * cont : simplify - no need to use GPU address ggml-ci * cont : prepare mem ranges for reuse + add ggml-metal-common.cpp ggml-ci * cont : avoid redundant keywords in cpp [no ci] * metal : reorder graph for better concurrency ggml-ci * metal : fix race on mem pool buffers ggml-ci * cont : add env GGML_METAL_GRAPH_OPTIMIZE_DISABLE ggml-ci * cont : refactor, optimize, add comments ggml-ci * cont : refactor ggml-metal.m ggml-ci * minor : update logs [no ci]
* metal : refactor bin kernels loading ggml-ci * metal : refactor rms kernel loading ggml-ci * ci : try to add memory leaks check ggml-ci * ci : try to enable memory leak detection for Mac * cont : seems to be working
* llama : allow using iGPUs with --device * mtmd : allow iGPU * rpc-server : allow iGPU
…ers (ggml-org#15705) Use this to query register count for shader compiles on NVIDIA. Currently this is only for performance debug, but it could eventually be used in some heuristics like split_k.
* vulkan: fix failing dequant shaders * add missing const
* ggml-zdnn: rm user mapped buffers Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: rm dead code Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: attempt to fix missing extra data buffer free Signed-off-by: Aaron Teo <[email protected]> --------- Signed-off-by: Aaron Teo <[email protected]>
* doc : update documentation for --tensor-split * Update tools/main/README.md Co-authored-by: Johannes Gäßler <[email protected]> * Update tools/main/README.md Co-authored-by: Diego Devesa <[email protected]> --------- Co-authored-by: Johannes Gäßler <[email protected]> Co-authored-by: Diego Devesa <[email protected]>
* releases : update ROCM, add gfx1200, gfx1201, gfx1151 * releases : set target to 13.3 for macos-x64 * add hipblaslt.dll to release * add hipblaslt/library to release
Fix regression introduced with commit 50f4281
* metal : fix kernel requirements ggml-ci * cont : fix supports_op * cont : fix supports_op for ARGMAX
) * build: fix the cache keys for Windows HIP release job Update the cache keys to include the HIP SDK version, preventing the use of outdated ROCm installation caches. * build: sync changes from release.yml to build.yml - Update HIP SDK version to 25.Q3 and ROCm version to 6.4.2 - Update the cache keys to reflect the new versions * build: remove Windows HIP release for gfx1151 since the current stable rocWMMA does not support gfx1151.
* vulkan: move mul_mm dequantization steps into a separate file and functions * improve mul_mm vector load code * fix debug mode issues and warnings
…adeon RX 9000 series (ggml-org#15994) * rocm.Dockerfile: added gfx1200,gfx1201 architectures to support AMD Radeon RX 9000 series https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.4.1/reference/system-requirements.html#rdna-os states the Radeon RX 9000 series is supported support from Ubuntu 24.04.2, and the dockerfile is using 24.04 which is ROCm 6.4. This fixed the `ROCm error: invalid device function` I was getting when trying to use the rocm container.
* metal : remove mem pool usage ggml-ci * metal : remove mem pool implementation ggml-ci * metal : take into account the actual allocated memory of the tensor ggml-ci * cont : use ggml_backend_buft_get_alloc_size ggml-ci * cont : improve, comments ggml-ci * cont : add functions for the extra tensor sizes * metal : add comments ggml-ci * metal : implement .get_alloc_size for the rest of the buffer types ggml-ci * metal : remove ggml_metal_heap ggml-ci
* add grok-2 support * type fix * type fix * type fix * "fix" vocab for invalid sequences * fix expert tensor mapping and spaces in vocab * add chat template * fix norm tensor mapping * rename layer_out_norm to ffn_post_norm * ensure ffn_post_norm is mapped * fix experts merging * remove erroneous FFN_GATE entry * concatenate split tensors and add more metadata * process all expert layers and try cat instead of hstack * add support for community BPE vocab * fix expert feed forward length and ffn_down concat * commit this too * add ffn_up/gate/down, unsure if sequence is right * add ffn_gate/down/up to tensor names * correct residual moe (still not working) * mess-- * fix embedding scale being applied twice * add built in chat template * change beta fast for grok if default value * remove spm vocab in favor of community bpe vocab * change attention temp length metadata type to integer * update attention temp length metadata * remove comment * replace M_SQRT2 with std::sqrt(2) * add yarn metadata, move defaults to hparams
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Updates dev branch with latest release (b6475) from ggml-org/llama.cpp