Releases · 2015aroras/llama.cpp

17 Sep 18:13

c959b67

b6498 Latest

Latest

CUDA: fix FA occupancy, optimize tile kernel (#15982)

Assets 15

cudart-llama-bin-win-cuda-12.4-x64.zip

sha256:8c79a9b226de4b3cacfd1f83d24f962d0773be79f1e7b75c6af4ded7e32ae1d6

373 MB 2025-09-17T18:14:00Z
llama-b6498-bin-macos-arm64.zip

sha256:0e5ad58fef2adb9f96e708a22125e756c49817ff90c9222ddfd52ccc26d3623c

11.2 MB 2025-09-17T18:14:12Z
llama-b6498-bin-macos-x64.zip

sha256:3b74b5b6f1cc83929dc43ae3c2ab3f740fa44d49cca1bd6f2904acfcbe002b8e

29.3 MB 2025-09-17T18:14:13Z
llama-b6498-bin-ubuntu-vulkan-x64.zip

sha256:5c0278737a19750c8cc641fbd5b5f86eba0583d4f9f063ee927f7cf52ee79988

26.1 MB 2025-09-17T18:14:14Z
llama-b6498-bin-ubuntu-x64.zip

sha256:aafc2d47e06de600d654069fab654508fc38b0784bbec4391f3c532c75110475

13.2 MB 2025-09-17T18:14:16Z
llama-b6498-bin-win-cpu-arm64.zip

sha256:f28cf25630f91a1a74aaca51a76b00eb0234e44fdffcceadb38984d0d7c5ef5d

11.4 MB 2025-09-17T18:14:17Z
llama-b6498-bin-win-cpu-x64.zip

sha256:180c596ac7c3b17bcb87388ee1c381f60fd49d02b24a560ad5dd834a849006d3

14.4 MB 2025-09-17T18:14:18Z
llama-b6498-bin-win-cuda-12.4-x64.zip

sha256:c78e739edf3a35460eeb83e87eac2b3d11eb6ce8f2595afe4ead1fb88e56d359

147 MB 2025-09-17T18:14:19Z
llama-b6498-bin-win-hip-radeon-x64.zip

sha256:a7a8752dd4118c479e4634ed95f6308f7c4f489fd71a61be5264fb59dd3f9504

319 MB 2025-09-17T18:14:24Z
llama-b6498-bin-win-opencl-adreno-arm64.zip

sha256:38ad49dde8e1eb695ac820cc5a0c0d4802d0490184cccfd97c22edde222e6cc1

11.8 MB 2025-09-17T18:14:32Z
Source code (zip)

2025-09-17T13:32:42Z
Source code (tar.gz)

2025-09-17T13:32:42Z

10 Sep 21:23

github-actions

b6445

00681df

b6445

CUDA: Add `fastdiv` to `k_bin_bcast*`, giving 1-3% E2E performance (#…

Assets 15

06 Jan 17:40

github-actions

b4430

ecebbd2

b4430

llama : remove unused headers (#11109)

ggml-ci

Assets 23

26 Nov 02:17

github-actions

b4173

0cc6375

b4173

Introduce llama-run (#10291)

It's like simple-chat but it uses smart pointers to avoid manual
memory cleanups. Less memory leaks in the code now. Avoid printing
multiple dots. Split code into smaller functions. Uses no exception
handling.

Signed-off-by: Eric Curtin <[email protected]>

Assets 21

25 Nov 17:44

github-actions

b4165

a9a678a

b4165

Add download chat feature to server chat (#10481)

* Add download chat feature to server chat

Add a download feature next to the delete chat feature in the server vue chat interface.

* code style

---------

Co-authored-by: Xuan Son Nguyen <[email protected]>

Assets 21

15 Nov 19:43

github-actions

b4091

09ecbcb

b4091

cmake : fix ppc64 check (whisper/0)

ggml-ci

Assets 21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Releases: 2015aroras/llama.cpp

b6498

Uh oh!

b6445

Uh oh!

b4430

Uh oh!

b4173

Uh oh!

b4165

Uh oh!

b4091

Uh oh!