Sync master with upstream release b6153 #204

jan-service-account · 2025-08-15T00:13:12Z

Updates dev branch with latest release (b6153) from ggml-org/llama.cpp

…over RPC (macOS & others) (ggml-org#15188) * ggml-rpc: chunk send()/recv() to avoid EINVAL for very large tensors over RPC (macOS & others). Fixes ggml-org#15055 * ggml-rpc: rename RPC_IO_CHUNK->MAX_CHUNK_SIZE, use std::min() for cap, switch to GGML_LOG_ERROR, handle 0-length send/recv * rpc: drop n==0 special case in send_data(); retry in loop per review * rpc: remove trailing whitespace in send_data() --------- Co-authored-by: Shinnosuke Takagi <[email protected]>

@JohannesGaessler

…vement on kernel-level and 10% perf increase for Gemma3n (ggml-org#15132) * Factor out `reduce_rows_f32` from common.cuh This increases iteration cycle speed by not having to recompile every kernel all the time * Hide memory-latency by loop unrolling in reduce_rows_f32 * Further optimizations to `reduce_rows_f32` 1. Increase threadblock size to better hide latency of memory requests. As a consequence of bigger threadblocks, do 2-step summation, using shared memory to communicate results between invocations 2. Use sum_temp array to reduce waits on sum 3. Adjust num_unroll to reflext bigger threadblock 4. Improve default block_dims, increase support for more block_dims * Add perf tests for `reduce_rows_f32` kernel * Add heuristic to toggle 128/512 threads based on sm count Break even point was the minimum of the following multiples. | GPU Model | Nrow SM Count Multiple | | ----------- | ----------- | | RTX 4000 SFF ADA | 2.0x | | RTX 6000 ADA | 2.5x | | RTX PRO 6000 Blackwell Max-Q | 3.04x | | RTX PRO 4500 Blackwell | 3.15x | * Ensure perf gains also for small ncols and large nrows Alternative to this, one could have also made the number of unrollings template-able, but that would require compiling the kernel multiple times, increasing binary size unnecessarily * Modify perf and unit-tests * Apply auto-formatting by clang * Fix CI build failure See https://github.com/ggml-org/llama.cpp/actions/runs/16798370266/job/47573716079?pr=15132#step:7:486 Building with VS generator worked though. * Remove sm_count property from `ggml_backend_cuda_context` Requested by @JohannesGaessler, and should fix remaining CI issues as a side-effect * Add CUB-based implementation for GGML_OP_MEAN Currently this branch is only executed for nrows==1 * Add heuristics to execute CUB branch only when it brings perf Heuristics were determined on the following HW: * RTX 4000 SFF ADA * RTX 6000 ADA * RTX PRO 6000 Blackwell Max-Q * RTX PRO 4500 Blackwell * Add unit-test for CUB-based mean Tests should run with CUDA Graphs enabled per default on NVGPUs * Rename `USE_CUB` to `GGML_CUDA_USE_CUB` Suggested by @JohannesGaessler * Unindent Preprocessor directives See ggml-org#15132 (comment)

ggml-ci

) * ci : add flake8 and pyright to copilot-setup-steps.yml * add tools/server/tests/requirements.txt

* Changed the CI file to hw * Changed the CI file to hw * Added to sudoers for apt * Removed the clone command and used checkout * Added libcurl * Added gcc-14 * Checking gcc --version * added gcc-14 symlink * added CC and C++ variables * Added the gguf weight * Changed the weights path * Added system specification * Removed white spaces * ci: Replace Jenkins riscv native build Cloud-V pipeline with GitHub Actions workflow Removed the legacy .devops/cloud-v-pipeline Jenkins CI configuration and introduced .github/workflows/build-riscv-native.yml for native RISC-V builds using GitHub Actions. * removed trailing whitespaces --------- Co-authored-by: Akif Ejaz <[email protected]>

@slaren

…e-draft parameters (ggml-org#15191) * Checkpoint from VS Code for coding agent session * Initial plan * Fix typo in --override-tensor-draft flag implementation * Add null termination for speculative tensor buffer overrides * Apply suggestions from code review * Apply suggestions from code review * Extract tensor override parsing logic to common function (addresses @slaren's feedback) * Apply suggestions from code review * Apply suggestions --------- Co-authored-by: Sigbjørn Skjæret <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]> Co-authored-by: Diego Devesa <[email protected]>

* update `rope_multi`: 1. add `ggml_rope_multi_inplace`; 1. use `GGML_MROPE_SECTIONS` instead of 4. * Apply suggestions from code review Co-authored-by: Georgi Gerganov <[email protected]> --------- Co-authored-by: Georgi Gerganov <[email protected]>

…ggml-org#15295) The flake.nix included references to llama-cpp.cachix.org cache with a comment claiming it's 'Populated by the CI in ggml-org/llama.cpp', but: 1. No visible CI workflow populates this cache 2. The cache is empty for recent builds (tested b6150, etc.) 3. This misleads users into expecting pre-built binaries that don't exist This change removes the non-functional cache references entirely, leaving only the working cuda-maintainers cache that actually provides CUDA dependencies. Users can still manually add the llama-cpp cache if it becomes functional in the future.

…org#15303) * perplexity: give more information about constraints on failure This checks whether -np is insufficient vs context, and provides clues as to how much is needed for each. * log formatting * log error and return instead of storing max_seq_exceeded int * check if s0 is zero for -np check

Tak-RS and others added 13 commits August 13, 2025 08:54

ci : add copilot-setup-steps.yml (ggml-org#15214)

bc51822

ggml : repack block_iq4_nlx8 (ggml-org#14904)

00f35d5

ggml-ci

ci : add more python requirements to copilot-setup-steps (ggml-org#15289

07aa869

) * ci : add flake8 and pyright to copilot-setup-steps.yml * add tools/server/tests/requirements.txt

server : filter out harmony thought messages (ggml-org#15278)

e885445

server : enable -td and -tbd parameters (ggml-org#15172)

b3e1666

HIP: bump requirement to rocm 6.1 (ggml-org#15296)

29c8fbe

jan-service-account merged commit 4b9dac4 into dev Aug 15, 2025
10 checks passed

jan-service-account deleted the update-dev-from-master-2025-08-15-00-13 branch August 15, 2025 00:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sync master with upstream release b6153 #204

Sync master with upstream release b6153 #204

Uh oh!

jan-service-account commented Aug 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

Sync master with upstream release b6153 #204

Sync master with upstream release b6153 #204

Uh oh!

Conversation

jan-service-account commented Aug 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants