Skip to content

Releases: z80maniac/llama.cpp

b1663

20 Dec 17:17
799fc22

Choose a tag to compare

CUDA: Faster Mixtral prompt processing (#4538)

* CUDA: make MoE tensors contiguous for batch size>1

* Update ggml-cuda.cu

Co-authored-by: slaren <[email protected]>

---------

Co-authored-by: slaren <[email protected]>

b1644

16 Dec 15:19
8a5be3b

Choose a tag to compare

llama : sanity checks for access to logits (#4274)

Co-authored-by: Georgi Gerganov <[email protected]>

b1643

15 Dec 18:12
88ae895

Choose a tag to compare

server : add optional API Key Authentication example (#4441)

* Add API key authentication for enhanced server-client security

* server : to snake_case

---------

Co-authored-by: Georgi Gerganov <[email protected]>

b1620

08 Dec 18:06
fe680e3

Choose a tag to compare

sync : ggml (new ops, tests, backend, etc.) (#4359)

* sync : ggml (part 1)

* sync : ggml (part 2, CUDA)

* sync : ggml (part 3, Metal)

* ggml : build fixes

ggml-ci

* cuda : restore lost changes

* cuda : restore lost changes (StableLM rope)

* cmake : enable separable compilation for CUDA

ggml-ci

* ggml-cuda : remove device side dequantize

* Revert "cmake : enable separable compilation for CUDA"

This reverts commit 09e35d04b1c4ca67f9685690160b35bc885a89ac.

* cuda : remove assert for rope

* tests : add test-backend-ops

* ggml : fix bug in ggml_concat

* ggml : restore `ggml_get_n_tasks()` logic in `ggml_graph_plan()`

* ci : try to fix macOS

* ggml-backend : remove backend self-registration

* ci : disable Metal for macOS cmake build

ggml-ci

* metal : fix "supports family" call

* metal : fix assert

* metal : print resource path

ggml-ci

---------

Co-authored-by: slaren <[email protected]>