Sync master with upstream release b5371 #89

jan-service-account · 2025-05-14T00:08:48Z

Updates dev branch with latest release (b5371) from ggml-org/llama.cpp

* feat: Add GGUF conversion for granitemoeshared Branch: GraniteMoEShared Signed-off-by: Gabe Goodhart <[email protected]> * feat: hparam and arch plumbing for granitemoeshared Branch: GraniteMoEShared Signed-off-by: Gabe Goodhart <[email protected]> * fix: Split MoE fused tensors for shared experts in conversion Branch: GraniteMoEShared Signed-off-by: Gabe Goodhart <[email protected]> * feat: First WIP cut at model arch in cpp The hparam and architecture plumbing should be correct, but the implementation of the shared experts seems to still be broken. Branch: GraniteMoEShared Signed-off-by: Gabe Goodhart <[email protected]> * fix: Cleaner (maybe more correct?) splitting for gate/up Branch: GraniteMoEShared Signed-off-by: Gabe Goodhart <[email protected]> * fix: Fix the input to the shared experts I had misread that the shared experts take the inputs _before_ the standard MoE layer and was feeding the output of the MoE to the shared experts. Branch: GraniteMoEShared Signed-off-by: Gabe Goodhart <[email protected]> * fix: Avoid architecture-specific checks for Granite MoE Shared This is a cleaner way that will allow more flexibility in architecture strings going forward. Branch: GraniteMoEShared Signed-off-by: Gabe Goodhart <[email protected]> * refactor: Split granite architectures out of llm_build_llama This helps de-clutter the llama-family graph construction and allows granite to diverge further (in preparation for Granite 4). NOTE: I removed the granite scale factors from llm_build_deci because they appear to only be there as copy-paste from llm_build_llama. The HF config does not seem to set those values: https://huggingface.co/Deci/DeciLM-7B/blob/main/config.json Branch: GraniteMoEShared Signed-off-by: Gabe Goodhart <[email protected]> * fix: Fix compiler warning about uninitialized inp_pos This should not have been reachable, but it warns on some compliers Branch: GraniteMoEShared Signed-off-by: Gabe Goodhart <[email protected]> * fix: Consoladate GraniteMoEShared into GraniteMoE for conversion Branch: GraniteMoEShared Signed-off-by: Gabe Goodhart <[email protected]> * fix: Consolidate GraniteMoEShared into GraniteMoE on the c++ side Branch: GraniteMoEShared Signed-off-by: Gabe Goodhart <[email protected]> --------- Signed-off-by: Gabe Goodhart <[email protected]>

….py (ggml-org#13455)

…ggml-org#13460) * mtmd : remove libllava, remove clip-quantize-cli * rm clip_model_quantize

…g#13509) Signed-off-by: Dan Johansson <[email protected]>

* batched-bench : fix pp batch contents * metal : optimize multi-sequence FA vec kernel ggml-ci

* batched-bench : fix pp batch contents * metal : optimize multi-sequence FA vec kernel ggml-ci * metal : use FA-vec kernel up to batch size 20 ggml-ci

ggerganov and others added 10 commits May 13, 2025 14:02

sync : ggml

1e2809b

scripts : support arbitrary input file formats in compare-llama-bench…

bf79371

….py (ggml-org#13455)

mtmd : remove libllava, remove clip-quantize-cli (⚠️ breaking change) (…

b472634

…ggml-org#13460) * mtmd : remove libllava, remove clip-quantize-cli * rm clip_model_quantize

batched-bench : fix pp batch contents (ggml-org#13492)

b89d605

ggml-cpu: Update KleidiAI to v1.6 and fix include directives (ggml-or…

4f711af

…g#13509) Signed-off-by: Dan Johansson <[email protected]>

metal : optimize multi-sequence FA vec kernel (ggml-org#13493)

c252e0c

* batched-bench : fix pp batch contents * metal : optimize multi-sequence FA vec kernel ggml-ci

metal : use FA-vec kernel up to batch size 20 (ggml-org#13496)

f0995d2

* batched-bench : fix pp batch contents * metal : optimize multi-sequence FA vec kernel ggml-ci * metal : use FA-vec kernel up to batch size 20 ggml-ci

clip : clip.h become private API (⚠️ breaking change) (ggml-org#13510)

71bdbdb

quantize : improve tensor-type pattern matching (ggml-org#13033)

e5c834f

jan-service-account merged commit f037995 into dev May 14, 2025
15 checks passed

jan-service-account deleted the update-dev-from-master-2025-05-14-00-08 branch May 14, 2025 00:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sync master with upstream release b5371 #89

Sync master with upstream release b5371 #89

Uh oh!

jan-service-account commented May 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Sync master with upstream release b5371 #89

Sync master with upstream release b5371 #89

Uh oh!

Conversation

jan-service-account commented May 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants