Releases · ngxson/llama.cpp

31 Aug 14:39

9777032

b6334

llama : separate compute buffer reserve from fattn check (#15696)

Exposes ggml_backend_sched_split_graph() to allow splitting the graph without allocating compute buffers and uses it to split the graph for the automatic Flash Attention check.

Assets 15

31 Aug 08:43

github-actions

b6332

bbbf5ec

b6332

vulkan: handle large sizes for get_rows (#15686)

Assets 15

31 Aug 07:41

github-actions

b6331

c37052a

b6331

vulkan: mul_mat_id coopmat2 optimizations (#15546)

* vulkan: mul_mat_id coopmat2 optimizations

Add a path for when the tile fits in BN/2, similar to what we have for mul_mat.

Only call fetch_scales/store_scales once per QUANT_K block, and once at the
beginning in case start_k is not aligned.

* Also add a path for BN/4 - worth a couple more percent

Assets 15

31 Aug 07:37

github-actions

b6330

5c16b9c

b6330

vulkan : remove unused portability_enumeration_ext variable (#15679)

This commit removes the portability_enumeration_ext variable from the
ggml_vk_instance_portability_enumeration_ext_available function as it
is initialized to false but never modified, making it redundant.

Assets 15

31 Aug 07:27

github-actions

b6329

b97c9ed

b6329

vulkan: Allow fallback to sysmem memory when vidmem is full (#15649)

* vulkan: Allow fallback to sysmem memory when vidmem is full

* vulkan: Add env var GGML_VK_ALLOW_SYSMEM_FALLBACK

Assets 15

31 Aug 07:04

github-actions

b6328

94e82c7

b6328

vulkan: clamp matmul and FA results to the max finite value (#15652)

* vulkan: clamp matmul and FA results to the max finite value

* only clamp for fp16

Assets 15

30 Aug 16:25

github-actions

b6327

4d74393

b6327

ggml: update kleidiai to v1.13.0 (#15663)

Assets 15

30 Aug 15:04

github-actions

b6325

e81b8e4

b6325

llama: use FA + max. GPU layers by default (#15434)

* llama: use max. GPU layers by default, auto -fa

* ggml-backend: abort instead of segfault

Assets 15

30 Aug 14:48

github-actions

b6324

38ad381

b6324

CUDA: use FP32 arithmetic for conv2d (#15683)

Assets 15

30 Aug 09:34

github-actions

b6323

696fccf

b6323

vulkan: Skip syncing for prealloc_y when it is reused (#15544)

Assets 15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ngxson/llama.cpp

b6334

Uh oh!

b6332

Uh oh!

b6331

Uh oh!

b6330

Uh oh!

b6329

Uh oh!

b6328

Uh oh!

b6327

Uh oh!

b6325

Uh oh!

b6324

Uh oh!

b6323

Uh oh!