[pull] master from ggml-org:master by pull[bot] · Pull Request #1118 · syther-labs/llama.cpp

pull · 2026-02-10T11:45:35Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

* cuda : extend GGML_OP_PAD to work with non-cont src0 * tests : add permuted pad

Implement ggml_cann_mul_mat_id_quant function to support quantized matrix multiplication for Mixture of Experts (MoE) architectures on CANN backend. Key features: - Support Q4_0 and Q8_0 quantized weight formats - Use IndexSelect to dynamically route expert-specific weights based on indices - Leverage WeightQuantBatchMatmulV2 for efficient quantized computation - Handle automatic F16 type conversion for hardware compatibility - Support both per-expert and broadcast input modes Implementation details: - Extract expert weights and scales using CANN IndexSelect operation - Process each batch and expert combination independently - Create proper tensor views with correct stride for matmul operations - Automatic input/output type casting to/from F16 as needed Testing: All test cases passed for supported types (F32, F16, Q4_0, Q8_0).

…xtModel (#19445) * Add special case for Qwen3VLMoe * Fix down path, remove arrows and checkmarks * ws * Moved to Qwen3VL * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

…ion (#19452) using noexcept std::filesystem::directory_entry::is_regular_file overload prevents abnormal termination upon throwing an error (as caused by symlinks to non-existent folders on linux) Resolves: #18560

…ons (dotprod) (#19360) * First working version of GEMM and GEMV * interleave loads and compute * Clang-format * Added missing fallback. Removed tested TODO. * Swap M and N to be consistent with the repack template convention

ggerganov and others added 7 commits February 10, 2026 08:07

cuda : extend GGML_OP_PAD to work with non-cont src0 (#19429)

a0d5855

* cuda : extend GGML_OP_PAD to work with non-cont src0 * tests : add permuted pad

CANN: Remove unnecessary wrapper for gml_backend_buft_is_cann (#18968)

f0bfe54

tts : fix typos in README.md [no ci] (#19463)

66d403c

pull bot locked and limited conversation to collaborators Feb 10, 2026

pull bot added the ⤵️ pull label Feb 10, 2026

pull bot merged commit c03a5a4 into syther-labs:master Feb 10, 2026
17 of 20 checks passed

github-actions bot added testing python ggml examples Nvidia GPU Ascend NPU labels Feb 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from ggml-org:master#1118

[pull] master from ggml-org:master#1118
pull[bot] merged 7 commits intosyther-labs:masterfrom
ggml-org:master

pull bot commented Feb 10, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

pull bot commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

pull bot commented Feb 10, 2026 •

edited

Loading