merge from upstream #58

l3utterfly · 2025-03-18T10:36:20Z

No description provided.

…re not in printable range (ggml-org#12178) (ggml-org#12338) * Fix DOS index bug * Remove new APIs * remove extra line * Remove from API * Add extra newline * Update examples/server/server.cpp --------- Co-authored-by: Xuan-Son Nguyen <[email protected]>

…ml-org#12181) * llama : refactor llama_context, llama_kv_cache, llm_build_context ggml-ci * graph : don't mutate the KV cache during defrag ggml-ci * context : reduce virtuals + remove test function ggml-ci * context : move interface implementation to source file + factory ggml-ci * graph : move KV cache build functions to llama_context impl ggml-ci * graph : remove model reference from build_pooling ggml-ci * graph : remove llama_model reference ggml-ci * kv_cache : provide rope factors ggml-ci * graph : rework inputs to use only unique_ptr, remove attn input abstraction ggml-ci * context : remove llama_context_i abstraction ggml-ci * context : clean-up ggml-ci * graph : clean-up ggml-ci * llama : remove redundant keywords (struct, enum) ggml-ci * model : adapt gemma3 ggml-ci * graph : restore same attention ops as on master ggml-ci * llama : remove TODO + fix indent ggml-ci

…-org#12364)

* llama : fix Gemma3 SWA KV cache shift ggml-ci * hparams : add comment [no ci]

ggml-ci

* llama : introduce llama_set_warmup() API call that controls warmup mode; use all MoE experts during warmup * common : use new API to enable warmup mode during model warmup --------- Co-authored-by: Stanisław Szymczyk <[email protected]>

) * add system_prompt_file * add -sysf / --system-prompt-file * remove system_prompt_file

…rg#12370) We default to 4, sometimes we want to manually adjust this Signed-off-by: Eric Curtin <[email protected]>

…gml-org#12399) * sycl : support non-contiguous tensors in binary ops * sycl : silence unused variable warning --------- Co-authored-by: Stanisław Szymczyk <[email protected]>

* added -o option to specify an output file name * llama-tts returns ENOENT in case of file write error note : PR ggml-org#12042 is closed as superseded with this one.

This commit adds the --symlinks option to the zip command used to create the xcframework zip file. This is necessary to create symlinks in the zip file. Without this option, the Versions symlink is stored as a regular directory entry in the zip file, rather than as a symlink in the zip which causes the followig error in xcode: ```console Couldn't resolve framework symlink for '/Users/danbev/work/ai/llama.cpp/tmp_1/build-apple/llama.xcframework/macos-arm64_x86_64/llama.framework/Versions/Current': readlink(/Users/danbev/work/ai/llama.cpp/tmp_1/build-apple/llama.xcframework/macos-arm64_x86_64/llama.framework/Versions/Current): Invalid argument (22) ``` Refs: ggml-org#11996 (comment)

ggml-ci

* SYCL: set extras only on GGML_TYPE_Q4_0 * release tensor_extras in reset buffer interface

* cmake: Factor out compiler flag function from ggml llama.cpps's build requires it, too, and we may want to make use of it without add_subdirectory(ggml). * cmake: Enable building against system ggml This facilitates package maintenance for Linux distributions, where the libggml library most likely will be shipped as an individual package upon which a llama.cpp package depends.

…12258)

…s checking (ggml-org#12273) * vulkan: Pad N dimension of B matrix for coopmat2 perf, to avoid bounds checking

* vulkan: subgroup size test * Vulkan: Add device architecture enum and logic to recognize AMD generations * vulkan: use new architecture logic to specify subgroup size * Initial vulkan subgroup size tuning for RDNA3 * vulkan: commonize RDNA subgroup tuning * vulkan: override subgroup size if required_subgroup_size = 0 * vulkan: disable warp 32 for RDNA3 * vulkan: fine tuned RDNA1 subgroup sizes * vulkan: adjusted subgroup size map * vulkan: fixed RDNA2 subgroup map --------- Co-authored-by: 0cc4m <[email protected]>

…12312)

It's already found by FindVulkan.cmake in the parent CMakeLists

* Enable CUDA Graph on CTK < 12.x `cudaGraphExecUpdate` API was changed on 12.x. For this reason CUDA graph support was disabled on older CUDA toolkit. This change enables CUDA support in CTK version < 12.x by using older API if CTK < 12.x. * Fix compilation errors with MUSA * Disable CUDA Graph for MUSA

…g#12426)

* ggml: Add op l2_norm Signed-off-by: Molly Sophia <[email protected]> * ggml: Add op rwkv_wkv7 Signed-off-by: Molly Sophia <[email protected]> * llama: Add support for RWKV7 and ARWKV7 models Signed-off-by: Molly Sophia <[email protected]> * llama: fix inference with RWKV6Qwen2 Signed-off-by: Molly Sophia <[email protected]> * llama: add more (a)rwkv7 variants in size Signed-off-by: Molly Sophia <[email protected]> * Apply code-format changes Signed-off-by: Molly Sophia <[email protected]> * fix MUSA build Signed-off-by: Molly Sophia <[email protected]> * llama: fix shape error with rwkv using llama-parallel Signed-off-by: Molly Sophia <[email protected]> --------- Signed-off-by: Molly Sophia <[email protected]>

…ion and driver issues (ggml-org#12434)

Closes ggml-org#12240

ggml-ci

…e option (ggml-org#12371) * alberto changes * enable sycl graphs by env variable * fixed compilation warnings in ggml-sycl.cpp * renamed graph variables * fix markdown in docs/backend/SYCL.md Co-authored-by: Romain Biessy <[email protected]> * fix markdown in docs/backend/SYCL.md again * compiling graphs by default, renamed graph_enable to graph_disable --------- Co-authored-by: Romain Biessy <[email protected]>

ishaangandhi and others added 30 commits March 13, 2025 11:10

arg : no n_predict = -2 for examples except for main and infill (ggml…

be7c303

…-org#12364)

llama : fix Gemma3 SWA KV cache shift (ggml-org#12373)

84d5475

* llama : fix Gemma3 SWA KV cache shift ggml-ci * hparams : add comment [no ci]

hparams : add SWA rope parameters (ggml-org#12374)

081bee8

ggml-ci

graph : simplify attn input build for unified KV cache (ggml-org#12381)

c522ce4

ggml-ci

server: fix "--grammar-file" parameter (ggml-org#12285)

add2a3a

main : add -sysf / --system-prompt-file (ggml-org#12249) (ggml-org#12250

774973b

) * add system_prompt_file * add -sysf / --system-prompt-file * remove system_prompt_file

Add CLI arg to llama-run to adjust the number of threads used (ggml-o…

9f2250b

…rg#12370) We default to 4, sometimes we want to manually adjust this Signed-off-by: Eric Curtin <[email protected]>

[CANN]MUL_MAT optimization (ggml-org#12382)

92a3913

SYCL : support non-contiguous tensors in binary ops (add, sub, etc) (g…

b19bd06

…gml-org#12399) * sycl : support non-contiguous tensors in binary ops * sycl : silence unused variable warning --------- Co-authored-by: Stanisław Szymczyk <[email protected]>

SYCL: Delete redundant plus sign and space (ggml-org#12391)

3d35d87

llama-tts : add '-o' option (ggml-org#12398)

f4c3dd5

* added -o option to specify an output file name * llama-tts returns ENOENT in case of file write error note : PR ggml-org#12042 is closed as superseded with this one.

context : fix init of n_outputs (ggml-org#12397)

dc079cf

ggml-ci

llama : fix OLMo-2-0325-32B-Instruct K-norm size (ggml-org#12400)

8ba95dc

SYCL: set extras only on GGML_TYPE_Q4_0 (ggml-org#12366)

b3c9a65

* SYCL: set extras only on GGML_TYPE_Q4_0 * release tensor_extras in reset buffer interface

vulkan: Adjust coopmat2 tile sizes and selection heuristic (ggml-org#…

2f21123

…12258)

vulkan: Pad N dimension of B matrix for coopmat2 perf, to avoid bound…

891c639

…s checking (ggml-org#12273) * vulkan: Pad N dimension of B matrix for coopmat2 perf, to avoid bounds checking

vulkan: use fp32 in coopmat2 q4_k dequant function (ggml-org#12309)

f07690c

vulkan: Add N/2 and N/4 optimized paths in coopmat2 shader (ggml-org#…

484a8ab

…12312)

ggml-vulkan: remove unused find_program(glslc) (ggml-org#12416)

01e8f21

It's already found by FindVulkan.cmake in the parent CMakeLists

docs : bring llama-cli conversation/template docs up-to-date (ggml-or…

60c9029

…g#12426)

fixed compilation warnings in ggml-sycl (ggml-org#12424)

a53f7f7

Vulkan: Default to 1GB allocations instead of 4GB to avoid fragmentat…

fd123cf

…ion and driver issues (ggml-org#12434)

fj-y-saito and others added 4 commits March 18, 2025 10:14

ggml : add SVE support for q6_K_q8_K (ggml-org#12361)

d9a1452

cmake : fix PowerPC build (ggml-org#12241)

eba92d6

Closes ggml-org#12240

server : fix warmup draft cache type (ggml-org#12446)

810e0af

ggml-ci

l3utterfly merged commit f529b19 into layla-build Mar 18, 2025
84 of 98 checks passed

github-actions bot added documentation Improvements or additions to documentation SYCL Nvidia GPU Vulkan testing build examples devops python android server ggml Apple Metal labels Mar 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

merge from upstream #58

merge from upstream #58

Uh oh!

l3utterfly commented Mar 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

merge from upstream #58

merge from upstream #58

Uh oh!

Conversation

l3utterfly commented Mar 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants