Releases · Gadflyii/llama.cpp

11 Oct 23:33

11f0af5

b6736 Latest

Latest

CUDA: faster tile FA, add oob checks, more HSs (#16492)

Assets 15

cudart-llama-bin-win-cuda-12.4-x64.zip

sha256:8c79a9b226de4b3cacfd1f83d24f962d0773be79f1e7b75c6af4ded7e32ae1d6

373 MB 2025-10-11T23:33:23Z
llama-b6736-bin-macos-arm64.zip

sha256:3aea9a95b024f9d3b48aeb864e267b82f0583478063e2858020744b56723d639

10.4 MB 2025-10-11T23:33:37Z
llama-b6736-bin-macos-x64.zip

sha256:1e0bc12464843a696f0373f0a1f1dfd2131234114d771cfb077dc550f9b7daef

26.9 MB 2025-10-11T23:33:38Z
llama-b6736-bin-ubuntu-vulkan-x64.zip

sha256:4e3874bd9dc2301d34ef14e8b6231e972c09a55eb4df77b2f649867c9bdf2ccb

25.5 MB 2025-10-11T23:33:40Z
llama-b6736-bin-ubuntu-x64.zip

sha256:48611bf9df2ce0b2ca34434bd93b0603b173abe5ecc6f8c952e5a3fe619fea0a

12.4 MB 2025-10-11T23:33:41Z
llama-b6736-bin-win-cpu-arm64.zip

sha256:e05aa3d7eabfb408391c3cb8f38da71ba062435089d6856dc92c76f172763a6b

10.6 MB 2025-10-11T23:33:42Z
llama-b6736-bin-win-cpu-x64.zip

sha256:6ce3ffbcc3c6a5847a69b4f7ff47ba67d14f37ff6af4b9b658a8dab2daa5d243

13.6 MB 2025-10-11T23:33:43Z
llama-b6736-bin-win-cuda-12.4-x64.zip

sha256:239baae9093623b705fa33b1e682610aedf31f715e767659c1caccc060069c04

161 MB 2025-10-11T23:33:45Z
llama-b6736-bin-win-hip-radeon-x64.zip

sha256:43aa4eb9b22e8510e65b59b910b679740ea7a1bdd873a5fabb2e408d7dc8b773

321 MB 2025-10-11T23:33:51Z
llama-b6736-bin-win-opencl-adreno-arm64.zip

sha256:a19ab2a118694f8ad9268fb5fe7766486c8a1a23049874d3e5cf07377fb9c759

11 MB 2025-10-11T23:34:03Z
Source code (zip)

2025-10-11T18:54:32Z
Source code (tar.gz)

2025-10-11T18:54:32Z

10 Oct 19:02

github-actions

b6729

81086cd

b6729

vocab : mark EOT token for Granite models (#16499)

* vocab : mark EOT token for Granite models

* sampling : fallback to EOS when EOT is not found

Assets 15

09 Oct 16:48

github-actions

b6725

a6cb7c8

b6725

Merge pull request #9 from ggml-org/master

Merge from upstream

Assets 15

07 Oct 11:30

github-actions

b6708

dec0c8d

b6708

Merge pull request #8 from ggml-org/master

merge from upstream

Assets 15

06 Oct 22:34

github-actions

b6700

3df2244

b6700

llama : add --no-host to disable host buffers (#16310)

* implement --no-host to disable host buffer

* fix equal_mparams

* move no-host enumeration order together with other model params

---------

Co-authored-by: slaren <[email protected]>

Assets 15

30 Sep 15:46

github-actions

b6647

be84153

b6647

implement --no-host to disable host buffer

Assets 15

30 Sep 15:37

github-actions

b6646

364a7a6

b6646

common : remove common_has_curl() (#16351)

`test-arg-parser.cpp` has been updated to work consistently,
regardless of whether CURL or SSL support is available, and
now always points to `ggml.ai`.

The previous timeout test has been removed, but it can be
added back by providing a dedicated URL under `ggml.ai`.

Signed-off-by: Adrien Gallouët <[email protected]>

Assets 15

29 Sep 17:37

github-actions

b6649

4b224c1

b6649

Merge branch 'ggml-org:master' into master

Assets 15

Releases: Gadflyii/llama.cpp

b6736

Uh oh!

b6729

Uh oh!

b6725

Uh oh!

b6708

Uh oh!

b6700

Uh oh!

b6647

Uh oh!

b6646

Uh oh!

b6649

Uh oh!