Sync master with upstream release b4920 #12

jan-service-account · 2025-03-19T06:17:16Z

Updates dev branch with latest release (b4920) from ggml-org/llama.cpp

* cmake: Factor out compiler flag function from ggml llama.cpps's build requires it, too, and we may want to make use of it without add_subdirectory(ggml). * cmake: Enable building against system ggml This facilitates package maintenance for Linux distributions, where the libggml library most likely will be shipped as an individual package upon which a llama.cpp package depends.

…12258)

…s checking (ggml-org#12273) * vulkan: Pad N dimension of B matrix for coopmat2 perf, to avoid bounds checking

* vulkan: subgroup size test * Vulkan: Add device architecture enum and logic to recognize AMD generations * vulkan: use new architecture logic to specify subgroup size * Initial vulkan subgroup size tuning for RDNA3 * vulkan: commonize RDNA subgroup tuning * vulkan: override subgroup size if required_subgroup_size = 0 * vulkan: disable warp 32 for RDNA3 * vulkan: fine tuned RDNA1 subgroup sizes * vulkan: adjusted subgroup size map * vulkan: fixed RDNA2 subgroup map --------- Co-authored-by: 0cc4m <[email protected]>

…12312)

It's already found by FindVulkan.cmake in the parent CMakeLists

* Enable CUDA Graph on CTK < 12.x `cudaGraphExecUpdate` API was changed on 12.x. For this reason CUDA graph support was disabled on older CUDA toolkit. This change enables CUDA support in CTK version < 12.x by using older API if CTK < 12.x. * Fix compilation errors with MUSA * Disable CUDA Graph for MUSA

…g#12426)

* ggml: Add op l2_norm Signed-off-by: Molly Sophia <[email protected]> * ggml: Add op rwkv_wkv7 Signed-off-by: Molly Sophia <[email protected]> * llama: Add support for RWKV7 and ARWKV7 models Signed-off-by: Molly Sophia <[email protected]> * llama: fix inference with RWKV6Qwen2 Signed-off-by: Molly Sophia <[email protected]> * llama: add more (a)rwkv7 variants in size Signed-off-by: Molly Sophia <[email protected]> * Apply code-format changes Signed-off-by: Molly Sophia <[email protected]> * fix MUSA build Signed-off-by: Molly Sophia <[email protected]> * llama: fix shape error with rwkv using llama-parallel Signed-off-by: Molly Sophia <[email protected]> --------- Signed-off-by: Molly Sophia <[email protected]>

…ion and driver issues (ggml-org#12434)

Closes ggml-org#12240

ggml-ci

…e option (ggml-org#12371) * alberto changes * enable sycl graphs by env variable * fixed compilation warnings in ggml-sycl.cpp * renamed graph variables * fix markdown in docs/backend/SYCL.md Co-authored-by: Romain Biessy <[email protected]> * fix markdown in docs/backend/SYCL.md again * compiling graphs by default, renamed graph_enable to graph_disable --------- Co-authored-by: Romain Biessy <[email protected]>

…g#12447) * context : always use non-causal attention for encoder graphs ggml-ci * context : move the change to llama_context::encode() ggml-ci

Signed-off-by: Xiaodong Ye <[email protected]>

* graph : normalize Q, K, V shapes and add comments ggml-ci * context : synchronize before getting cross attention data * model : fix command-r attention norm check

* opencl: more profiling timing * opencl: generate trace for profiling * opencl: reduce profiling overhead * Populate profiling timing info at the end rather than after each kernel run * opencl: fix for chrome tracing

ckastner and others added 23 commits March 17, 2025 11:05

vulkan: Adjust coopmat2 tile sizes and selection heuristic (ggml-org#…

2f21123

…12258)

vulkan: Pad N dimension of B matrix for coopmat2 perf, to avoid bound…

891c639

…s checking (ggml-org#12273) * vulkan: Pad N dimension of B matrix for coopmat2 perf, to avoid bounds checking

vulkan: use fp32 in coopmat2 q4_k dequant function (ggml-org#12309)

f07690c

vulkan: Add N/2 and N/4 optimized paths in coopmat2 shader (ggml-org#…

484a8ab

…12312)

ggml-vulkan: remove unused find_program(glslc) (ggml-org#12416)

01e8f21

It's already found by FindVulkan.cmake in the parent CMakeLists

docs : bring llama-cli conversation/template docs up-to-date (ggml-or…

60c9029

…g#12426)

fixed compilation warnings in ggml-sycl (ggml-org#12424)

a53f7f7

Vulkan: Default to 1GB allocations instead of 4GB to avoid fragmentat…

fd123cf

…ion and driver issues (ggml-org#12434)

ggml : add SVE support for q6_K_q8_K (ggml-org#12361)

d9a1452

cmake : fix PowerPC build (ggml-org#12241)

eba92d6

Closes ggml-org#12240

server : fix warmup draft cache type (ggml-org#12446)

810e0af

ggml-ci

context : always use non-causal attention for encoder graphs (ggml-or…

8551c44

…g#12447) * context : always use non-causal attention for encoder graphs ggml-ci * context : move the change to llama_context::encode() ggml-ci

llama : add support for EXAONE tied word embeddings (ggml-org#12451)

99aa304

speculative : fix seg fault in certain cases (ggml-org#12454)

c6af216

llama : support converting Mistral Small text-only (ggml-org#12450)

29fff30

musa: override warp_size of musa device to 32 (ggml-org#12445)

bb115d2

Signed-off-by: Xiaodong Ye <[email protected]>

graph : normalize Q, K, V shapes + sync cross attention (ggml-org#12449)

75422e8

* graph : normalize Q, K, V shapes and add comments ggml-ci * context : synchronize before getting cross attention data * model : fix command-r attention norm check

opencl: improve profiling (ggml-org#12442)

d84635b

* opencl: more profiling timing * opencl: generate trace for profiling * opencl: reduce profiling overhead * Populate profiling timing info at the end rather than after each kernel run * opencl: fix for chrome tracing

github-actions bot added Apple Metal SYCL Nvidia GPU Vulkan testing build examples labels Mar 19, 2025

github-actions bot added python ggml documentation server labels Mar 19, 2025

vansangpfiev closed this Mar 19, 2025

Minh141120 deleted the update-dev-from-master-2025-03-19-06-17 branch March 20, 2025 04:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sync master with upstream release b4920 #12

Sync master with upstream release b4920 #12

Uh oh!

jan-service-account commented Mar 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

18 participants

Sync master with upstream release b4920 #12

Sync master with upstream release b4920 #12

Uh oh!

Conversation

jan-service-account commented Mar 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

18 participants