Skip to content

[pull] master from ggml-org:master#1118

Merged
pull[bot] merged 7 commits intosyther-labs:masterfrom
ggml-org:master
Feb 10, 2026
Merged

[pull] master from ggml-org:master#1118
pull[bot] merged 7 commits intosyther-labs:masterfrom
ggml-org:master

Conversation

@pull
Copy link

@pull pull bot commented Feb 10, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

ggerganov and others added 7 commits February 10, 2026 08:07
* cuda : extend GGML_OP_PAD to work with non-cont src0

* tests : add permuted pad
Implement ggml_cann_mul_mat_id_quant function to support quantized matrix
multiplication for Mixture of Experts (MoE) architectures on CANN backend.

Key features:
- Support Q4_0 and Q8_0 quantized weight formats
- Use IndexSelect to dynamically route expert-specific weights based on indices
- Leverage WeightQuantBatchMatmulV2 for efficient quantized computation
- Handle automatic F16 type conversion for hardware compatibility
- Support both per-expert and broadcast input modes

Implementation details:
- Extract expert weights and scales using CANN IndexSelect operation
- Process each batch and expert combination independently
- Create proper tensor views with correct stride for matmul operations
- Automatic input/output type casting to/from F16 as needed

Testing: All test cases passed for supported types (F32, F16, Q4_0, Q8_0).
…xtModel (#19445)

* Add special case for Qwen3VLMoe

* Fix down path, remove arrows and checkmarks

* ws

* Moved to Qwen3VL

* Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
…ion (#19452)

using noexcept std::filesystem::directory_entry::is_regular_file
overload prevents abnormal termination upon throwing an error
(as caused by symlinks to non-existent folders on linux)

Resolves: #18560
…ons (dotprod) (#19360)

* First working version of GEMM and GEMV

* interleave loads and compute

* Clang-format

* Added missing fallback. Removed tested TODO.

* Swap M and N to be consistent with the repack template convention
@pull pull bot locked and limited conversation to collaborators Feb 10, 2026
@pull pull bot added the ⤵️ pull label Feb 10, 2026
@pull pull bot merged commit c03a5a4 into syther-labs:master Feb 10, 2026
17 of 20 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants