merge from upstream #64

l3utterfly · 2025-05-19T07:53:01Z

Make sure to read the contributing guidelines before submitting a PR

ggml-ci

…org#13174) * Prefilling assistant message in openai compatible API * fixed indentation * fixed code convention * simplify method usage * no more than one assistant message at end of messages * merge checks into prefill code * Update examples/server/utils.hpp --------- Co-authored-by: matteo <[email protected]> Co-authored-by: Xuan-Son Nguyen <[email protected]>

Signed-off-by: xiaofei <[email protected]>

* docker : do not build tests * include "ggml-cpu.h"

* arg : allow using -hf offline * add more comments in code [no ci]

z17 compilation requires GCC 15.1.0 and onwards Signed-off-by: Aaron Teo <[email protected]>

Build fails with compilation error on power pc. This patch fixes the same. Tested with unit tests run via --build <build_dir> && cd <build_dir> && make test Signed-off-by: Shalini Salomi Bodapati <[email protected]>

* convert : improve model arch handling * use AutoConfig * rm trust_remote_code * Update convert_hf_to_gguf.py * fix self.block_count for vision * fix NomicBertModel

* arg : -hf do not fail if url mismatch * do not return if cannot parse metadata json

ggml-ci

…ggml-org#13223)

* convert ok * load ok, missing patch merger * ah sheet it works * update llava/readme * add test * fix test

* whisper: suppress Windows compiler warnings This commit disables compiler warnings on window using MSVC. The motivation for these changes is that some compilers generate warnings for these conversion, for example Windows MSVC, and there are quite a few of them. This makes it a little difficult to spot new warnings that may be introduced and also can be difficult for users/embedders of ggml where these warnings are hard to separate from their own warnings. * squash! whisper: suppress Windows compiler warnings Move ggml related warnings into ggml. This commit also fixes the indentation and adds a missing whitespace to the if statement.

This commit adds a check to makes sure that the target exists before trying to add compile options to ignore warnings when using MSVC. The motivation for this is currently the build is broken depending on the cmake options provided. With this fix it should be possible to build even if the targets are not actually available. Refs: ggml-org/whisper.cpp#3090 (comment)

ggml-ci

…der (ggml-org#13191) * vulkan: Handle src1 batch dimension in non-contiguous mat-vec-mul shader

* vulkan: Add bfloat16 support This adds bfloat16 matrix multiply support based on VK_KHR_shader_bfloat16. The extension is required for coopmat multiply support, but matrix-vector multiply trivially promotes bf16 to fp32 and doesn't require the extension. The copy/get_rows shaders also don't require the extension. It's probably possible to fall back to non-coopmat and promote to fp32 when the extension isn't supported, but this change doesn't do that. The coopmat support also requires a glslc that supports the extension, which currently requires a custom build. * vulkan: Support bf16 tensors without the bf16 extension or coopmat support Compile a variant of the scalar mul_mm shader that will promote the bf16 values to float, and use that when either the bf16 extension or the coopmat extensions aren't available. * vulkan: bfloat16 fixes (really works without bfloat16 support now) * vulkan: fix spirv-val failure and reenable -O

@grf53

* minja: sync google/minja@f06140f - google/minja#67 (@grf53) - google/minja#66 (@taha-yassine) - google/minja#63 (@grf53) - google/minja#58 --------- Co-authored-by: ochafik <[email protected]>

…gml-org#13589)

ggml-ci

…13551) * webui : improve accessibility for visually impaired people * add a11y for extra contents * fix some labels being read twice * add skip to main content

…ggml-org#13577)

…l-org#13594)

* vulkan: move common FA code to flash_attn_base.comp * vulkan: move common FA index/stride setup code to flash_attn_base.comp * build fix

* parallel : add option for non-shared and larger prompts * parallel : update readme [no ci] * cont : add note about base models [no ci] * parallel : better var name ggml-ci

…13595) * fix: use the current build config for `vulkan-shaders-gen` * fix: only pass a valid build type to `--config`

* added no-prefill-assistant flag * reworded documentation comment * updated server README.md

Signed-off-by: noemotiovon <[email protected]>

JohannesGaessler and others added 30 commits April 29, 2025 16:00

CUDA: fix non-cont. inputs for batched mat mul (ggml-org#13155)

cdf7658

llama-bench: fixed size of fields to correctly map to values (ggml-or…

5a63980

…g#13183)

sampling : when top-k <= 0 -> noop (ggml-org#13173)

d9d398f

ggml-ci

scripts: n_depth for compare-llama-bench [no ci] (ggml-org#13201)

19e899c

rpc : fix cache directory initialization (ggml-org#13188)

a0f7016

Signed-off-by: xiaofei <[email protected]>

docker : do not build tests (ggml-org#13204)

da84c04

* docker : do not build tests * include "ggml-cpu.h"

arg : allow using -hf offline (ggml-org#13202)

5933e6f

* arg : allow using -hf offline * add more comments in code [no ci]

feat(ggml-cpu): enable z17 compile (ggml-org#13182)

44cd8d9

z17 compilation requires GCC 15.1.0 and onwards Signed-off-by: Aaron Teo <[email protected]>

convert : correct typo image_mean --> image_std (ggml-org#13208)

07c2e2f

ggml : fix ppc64le build (ggml-org#13176)

4163137

Build fails with compilation error on power pc. This patch fixes the same. Tested with unit tests run via --build <build_dir> && cd <build_dir> && make test Signed-off-by: Shalini Salomi Bodapati <[email protected]>

vulkan: use uint array index to avoid glslang bug (ggml-org#13193)

e5007a5

common : add -jf / --json-schema-file flag (ggml-org#12011)

3b127c7

llava : remove duplicate include (ggml-org#13207)

ceda28e

convert : improve model arch handling (ggml-org#13122)

3e168be

* convert : improve model arch handling * use AutoConfig * rm trust_remote_code * Update convert_hf_to_gguf.py * fix self.block_count for vision * fix NomicBertModel

fix typo: n_ctx_pre_seq -> n_ctx_per_seq (ggml-org#13221)

16a457f

arg : -hf do not fail if url mismatch (ggml-org#13219)

6f67cf1

* arg : -hf do not fail if url mismatch * do not return if cannot parse metadata json

CUDA: batched+noncont MMQ, refactor bs>1 MoE code (ggml-org#13199)

e1e8e09

cuda : fix unused variable compile warning (whisper/0)

9998540

ggml-ci

ggml : fix ggml_gallocr_ptr type (ggml/1205)

4254bb4

sync : ggml

8d33d74

llama-model : fix the reported size class for nomic-embed-text-v2-moe (…

a70183e

…ggml-org#13223)

arg : remove CURLINFO_EFFECTIVE_METHOD (ggml-org#13228)

13c9a33

mtmd : add **vision** support for Mistral Small 3.1 (ggml-org#13231)

8936784

* convert ok * load ok, missing patch merger * ah sheet it works * update llava/readme * add test * fix test

sync : ggml

b1dd4d0

ggml-ci

test: non-cont. b in test-backend-ops -o MUL_MAT (ggml-org#13187)

b0ecbd4

vulkan: Handle src1 batch dimension in non-contiguous mat-vec-mul sha…

fc727bc

…der (ggml-org#13191) * vulkan: Handle src1 batch dimension in non-contiguous mat-vec-mul shader

ochafik and others added 16 commits May 15, 2025 23:29

minja: sync (qwen3) (ggml-org#13573)

bc098c3

* minja: sync google/minja@f06140f - google/minja#67 (@grf53) - google/minja#66 (@taha-yassine) - google/minja#63 (@grf53) - google/minja#58 --------- Co-authored-by: ochafik <[email protected]>

sycl : fixed compilation warnings (ggml-org#13582)

0a338ed

ci : add ppc64el to build-linux-cross (ggml-org#13575)

7c07ac2

llama : print hint when loading a model when no backends are loaded (g…

5364ae4

…gml-org#13589)

metal : add FA-vec kernel for head size 64 (ggml-org#13583)

654a677

ggml-ci

releases : use arm version of curl for arm releases (ggml-org#13592)

415e40a

readme : add list of dependencies and their license (ggml-org#13591)

06c1e4a

webui : improve accessibility for visually impaired people (ggml-org#…

aea9f8b

…13551) * webui : improve accessibility for visually impaired people * add a11y for extra contents * fix some labels being read twice * add skip to main content

server : do not return error out of context (with ctx shift disabled) (…

6aa892e

…ggml-org#13577)

llguidance : official v0.7.20 release (no actual changes) [noci] (ggm…

3e0be1c

…l-org#13594)

vulkan: use scalar FA rather than coopmat2 when N==1 (ggml-org#13554)

4f41ee1

vulkan: move common FA code to flash_attn_base.comp (ggml-org#13556)

2f5a4e1

* vulkan: move common FA code to flash_attn_base.comp * vulkan: move common FA index/stride setup code to flash_attn_base.comp * build fix

parallel : add option for non-shared and larger prompts (ggml-org#13598)

518329b

* parallel : add option for non-shared and larger prompts * parallel : update readme [no ci] * cont : add note about base models [no ci] * parallel : better var name ggml-ci

cmake: use the current build config for vulkan-shaders-gen (ggml-org#…

e3a7cf6

…13595) * fix: use the current build config for `vulkan-shaders-gen` * fix: only pass a valid build type to `--config`

server : added --no-prefill-assistant flag (ggml-org#13608)

6a2bc8b

* added no-prefill-assistant flag * reworded documentation comment * updated server README.md

CANN: Support MOE Model MUL_MAT_ID (ggml-org#13042)

33d7aed

Signed-off-by: noemotiovon <[email protected]>

github-actions bot added documentation Improvements or additions to documentation SYCL Nvidia GPU Vulkan testing build examples devops python server ggml Apple Metal script labels May 19, 2025

l3utterfly merged commit 140d3f1 into layla-build May 19, 2025
66 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

merge from upstream #64

merge from upstream #64

Uh oh!

l3utterfly commented May 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

59 participants

merge from upstream #64

merge from upstream #64

Uh oh!

Conversation

l3utterfly commented May 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

59 participants