Sync master with upstream release b6056 #186

jan-service-account · 2025-08-01T07:40:24Z

Updates dev branch with latest release (b6056) from ggml-org/llama.cpp

* graph : avoid creating redundant s_copy views * graph : comment the s_copy views

…t. (ggml-org#14985) * CANN: Improve loading efficiency after converting weights to NZ format. * CANN: fix typo

* Add support for Llada-8b: diffusion model * Add README * Fix README and convert_hf_to_gguf * convert_hf_to_gguf.py: address review comments * Make everything in a single example * Remove model-specific sampling * Remove unused argmax * Remove braced initializers, improve README.md a bit * Add diffusion specific gguf params in set_vocab, remove setting rope_theta and rms_norm_eps * Remove adding the mask token * Move add_add_bos_token to set_vocab * use add_bool in gguf_writer.py

Signed-off-by: Lukas Straub <[email protected]>

…gml-org#14968)

* llama-server : implement universal assisted decoding * Erase prompt tail for kv-cache * set vocab_dft_compatible in common_speculative * rename ctx_main to ctx_tgt * move vocab_dft_compatible to spec struct * clear mem_dft, remove mem * detokenize id_last for incompatible models * update comment * add --spec-replace flag * accept special tokens when translating between draft/main models * Escape spec-replace * clamp draft result to size to params.n_draft * fix comment * clean up code * restore old example * log common_speculative_are_compatible in speculative example * fix * Update common/speculative.cpp Co-authored-by: Georgi Gerganov <[email protected]> * Update common/speculative.cpp Co-authored-by: Georgi Gerganov <[email protected]> * Update common/speculative.cpp Co-authored-by: Georgi Gerganov <[email protected]> --------- Co-authored-by: Georgi Gerganov <[email protected]>

* MODEL_TENSOR.SSM_DT_NORM has defined twice, and second overwritten the jamba model's layername * correct order

* support minicpm-v 4 * add md * support MiniCPM-o 4.0 * add default location * temp rm MiniCPM-o 4.0 * fix code * fix "minicpmv_projector" default path

* vulkan: fix debug mode issues * vulkan: remove broken check_results GGML_OP_SET_ROWS support

…ion (ggml-org#14990)

…gml-org#14992)

@JohannesGaessler

…ml-org#14392) * compare-commits.sh: support both llama-bench and test-backend-ops Signed-off-by: Xiaodong Ye <[email protected]> * Speed up the build by specifying -j 12 Signed-off-by: Xiaodong Ye <[email protected]> * Remove build_number from test-backend-ops db Signed-off-by: Xiaodong Ye <[email protected]> * Apply suggestion from @JohannesGaessler Co-authored-by: Johannes Gäßler <[email protected]> * Refine tool selection logic Signed-off-by: Xiaodong Ye <[email protected]> * Address review comments Signed-off-by: Xiaodong Ye <[email protected]> --------- Signed-off-by: Xiaodong Ye <[email protected]> Signed-off-by: Xiaodong Ye <[email protected]> Co-authored-by: Johannes Gäßler <[email protected]>

* docker: add cann build pipline * docker: add cann build pipline * docker: fix cann devops * cann : fix multi card hccl * Update ggml/src/ggml-cann/ggml-cann.cpp Co-authored-by: Xuan-Son Nguyen <[email protected]> * Update ggml-cann.cpp --------- Co-authored-by: Georgi Gerganov <[email protected]> Co-authored-by: Xuan-Son Nguyen <[email protected]>

ggml-ci

* Initial Q2_K Block Interleaving Implementation * Addressed review comments and clean up of the code * Post rebase fixes * Initial CI/CD fixes * Update declarations in arch-fallback.h * Changes for GEMV Q2_K in arch-fallback.h * Enable repacking only on AVX-512 machines * Update comments in repack.cpp * Address q2k comments --------- Co-authored-by: Manogna-Sree <[email protected]>

compilade and others added 17 commits July 31, 2025 08:02

graph : reduce splits for recurrent and hybrid models (ggml-org#14825)

66625a5

* graph : avoid creating redundant s_copy views * graph : comment the s_copy views

CANN: Improve loading efficiency after converting weights to NZ forma…

11490b3

…t. (ggml-org#14985) * CANN: Improve loading efficiency after converting weights to NZ format. * CANN: fix typo

server : add openai-style logit_bias support (ggml-org#14946)

a9f77a8

Signed-off-by: Lukas Straub <[email protected]>

llama : merge build_moe_ffn_from_probs function into build_moe_ffn (g…

c1dacaa

…gml-org#14968)

MODEL_TENSOR.SSM_DT_NORM has defined twice (ggml-org#14991)

36e5fe7

* MODEL_TENSOR.SSM_DT_NORM has defined twice, and second overwritten the jamba model's layername * correct order

mtmd : support MiniCPM-V 4.0 (ggml-org#14983)

952a47f

* support minicpm-v 4 * add md * support MiniCPM-o 4.0 * add default location * temp rm MiniCPM-o 4.0 * fix code * fix "minicpmv_projector" default path

Vulkan: Fix minor debug mode issues (ggml-org#14899)

e08a988

* vulkan: fix debug mode issues * vulkan: remove broken check_results GGML_OP_SET_ROWS support

llama : allow other bufts when overriding to CPU, add --no-repack opt…

d6818d0

…ion (ggml-org#14990)

Fix params bug in diffusion example (ggml-org#14993)

7845240

llama : add simple option to enable CPU for MoE weights (--cpu-moe) (g…

a06ed5f

…gml-org#14992)

quantize : skip tensor override when in fallback mode (ggml-org#14995)

daf2dd7

graph : fix equal_seq() check (ggml-org#14986)

ba42794

ggml-ci

jan-service-account merged commit 876aa7c into dev Aug 1, 2025
14 checks passed

jan-service-account deleted the update-dev-from-master-2025-08-01-07-40 branch August 1, 2025 07:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sync master with upstream release b6056 #186

Sync master with upstream release b6056 #186

Uh oh!

jan-service-account commented Aug 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

17 participants

Sync master with upstream release b6056 #186

Sync master with upstream release b6056 #186

Uh oh!

Conversation

jan-service-account commented Aug 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

17 participants