Releases · ggml-org/llama.cpp

13 Aug 06:20

e71d48e

b6141 Latest

Latest

ggml-rpc: chunk send()/recv() to avoid EINVAL for very large tensors …

Assets 15

cudart-llama-bin-win-cuda-12.4-x64.zip

sha256:8c79a9b226de4b3cacfd1f83d24f962d0773be79f1e7b75c6af4ded7e32ae1d6

373 MB 2025-08-13T06:20:20Z
llama-b6141-bin-macos-arm64.zip

sha256:d9edb4338a6ebd3a9f49ef7035d70bce05c960c8630a4fbb36b075067a81e090

10.8 MB 2025-08-13T06:20:32Z
llama-b6141-bin-macos-x64.zip

sha256:f2a4049f3989fa5f6f82432a1ee637a02a31fdfc4457869081731b2a772559c0

27.6 MB 2025-08-13T06:20:33Z
llama-b6141-bin-ubuntu-vulkan-x64.zip

sha256:595ae0e6c956e87312e7574a55ac2f85f17240864ed371caaf371bb150fcdfb3

21.5 MB 2025-08-13T06:20:34Z
llama-b6141-bin-ubuntu-x64.zip

sha256:175bcf534dd0228b4016c1431be15440255abb217ba28239833d733cf1acce8c

12.7 MB 2025-08-13T06:20:36Z
llama-b6141-bin-win-cpu-arm64.zip

sha256:1c2728b171483b26ed83d1ac73ffb49b8c2d350c13319a859d33fdaf54c431d4

11 MB 2025-08-13T06:20:37Z
llama-b6141-bin-win-cpu-x64.zip

sha256:de120faf854361fd60d806106a77fcc859021099319053339ec2877a71096b1d

13.9 MB 2025-08-13T06:20:37Z
llama-b6141-bin-win-cuda-12.4-x64.zip

sha256:0f89275108ed87647e2492b663fb3ec9c8111b15eeacbe2a758b1fa9d035a63c

139 MB 2025-08-13T06:20:39Z
llama-b6141-bin-win-hip-radeon-x64.zip

sha256:4c090581be6ad678c4565848dbef2472716c5fbd78c24803aabe062a3d988359

287 MB 2025-08-13T06:20:43Z
llama-b6141-bin-win-opencl-adreno-arm64.zip

sha256:7c0862ecbf7d51e7ffddbef240b02a24d09caa537579a14aae33856dca087d32

11.4 MB 2025-08-13T06:20:51Z
Source code (zip)

2025-08-13T05:54:30Z
Source code (tar.gz)

2025-08-13T05:54:30Z

12 Aug 20:38

github-actions

b6140

b049315

b6140

HIP: disable sync warp shuffel operators from clr amd_warp_sync_funct…

Assets 15

12 Aug 12:16

github-actions

b6139

f4586ee

b6139

sycl: Fix and disable more configurations of mul_mat (#15151)

* sycl: Fix and disable more configurations of mul_mat

* Disable more configurations

Assets 15

12 Aug 10:17

github-actions

b6138

60a7658

b6138

opencl: allow mixed f16/f32 `add` (#15140)

Assets 15

12 Aug 10:02

github-actions

b6137

efe3a90

b6137

CUDA cmake: add `-lineinfo` for easier debug (#15260)

Assets 15

12 Aug 08:26

github-actions

b6136

bbd57b7

b6136

CANN: GGML_OP_CPY optimization (#15070)

Signed-off-by: noemotiovon <[email protected]>

Assets 15

12 Aug 03:01

github-actions

b6135

25ff6f7

b6135

musa: fix failures in test-backend-ops for mul_mat_id op (#15236)

* musa: fix failures in test-backend-ops for mul_mat_id op

Signed-off-by: Xiaodong Ye <[email protected]>

* Address review comments

Signed-off-by: Xiaodong Ye <[email protected]>

---------

Signed-off-by: Xiaodong Ye <[email protected]>

Assets 15

11 Aug 15:25

github-actions

b6134

be48528

b6134

CANN: Add broadcast for softmax and FA (#15208)

* refactor softmax

* fix fa

* fix mask shape

* format

* add comments

* Remove whitespace

Assets 15

11 Aug 15:30

github-actions

b6133

cf9e564

b6133

mtmd : Fix MinicpmV model converter and clip to avoid using hardcode.…

Assets 15

11 Aug 14:42

github-actions

b6132

fba5c0d

b6132

chat : hotfix gpt-oss jinja raising an exception (#15243)

* chat : hotfix gpt-oss jinja raising an exception

* fix

Assets 15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ggml-org/llama.cpp

b6141

Uh oh!

b6140

Uh oh!

b6139

Uh oh!

b6138

Uh oh!

b6137

Uh oh!

b6136

Uh oh!

b6135

Uh oh!

b6134

Uh oh!

b6133

Uh oh!

b6132

Uh oh!