Skip to content

Releases: aizip/llama.cpp

b3631

26 Aug 18:46
7d787ed

Choose a tag to compare

ggml : do not crash when quantizing q4_x_x with an imatrix (#9192)

b3618

23 Aug 18:28
3ba780e

Choose a tag to compare

lora : fix llama conversion script with ROPE_FREQS (#9117)

b3599

16 Aug 23:10
8b3befc

Choose a tag to compare

server : refactor middleware and /health endpoint (#9056)

* server : refactor middleware and /health endpoint

* move "fail_on_no_slot" to /slots

* Update examples/server/server.cpp

Co-authored-by: Georgi Gerganov <[email protected]>

* fix server tests

* fix CI

* update server docs

---------

Co-authored-by: Georgi Gerganov <[email protected]>

b3579

12 Aug 18:36
fc4ca27

Choose a tag to compare

ci : fix github workflow vulnerable to script injection (#9008)

Signed-off-by: Diogo Teles Sant'Anna <[email protected]>

b3488

30 Jul 00:55
75af08c

Choose a tag to compare

ggml: bugfix: fix the inactive elements is agnostic for risc-v vector…

b3466

25 Jul 23:51
01aec4a

Choose a tag to compare

server : add Speech Recognition & Synthesis to UI (#8679)

* server : add Speech Recognition & Synthesis to UI

* server : add Speech Recognition & Synthesis to UI (fixes)

b3423

19 Jul 19:48
87e397d

Choose a tag to compare

ggml : fix quant dot product with odd number of blocks (#8549)

* ggml : fix iq4_nl dot product with odd number of blocks

* ggml : fix odd blocks for ARM_NEON (#8556)

* ggml : fix iq4_nl dot product with odd number of blocks

* ggml : fix q4_1

* ggml : fix q5_0

* ggml : fix q5_1

* ggml : fix iq4_nl metal

ggml-ci

* ggml : fix q4_0

* ggml : fix q8_0

ggml-ci

* ggml : remove special Q4_0 code for first 2 blocks

* ggml : fix sumf redefinition

---------

Co-authored-by: slaren <[email protected]>

---------

Co-authored-by: Georgi Gerganov <[email protected]>

b3374

11 Jul 16:58
b078c61

Choose a tag to compare

cuda : suppress 'noreturn' warn in no_device_code (#8414)

* cuda : suppress 'noreturn' warn in no_device_code

This commit adds a while(true) loop to the no_device_code function in
common.cuh. This is done to suppress the warning:

```console
/ggml/src/ggml-cuda/template-instances/../common.cuh:346:1: warning:
function declared 'noreturn' should not return [-Winvalid-noreturn]
  346 | }
      | ^
```

The motivation for this is to reduce the number of warnings when
compilng with GGML_HIPBLAS=ON.

Signed-off-by: Daniel Bevenius <[email protected]>

* squash! cuda : suppress 'noreturn' warn in no_device_code

Update __trap macro instead of using a while loop to suppress the
warning.

Signed-off-by: Daniel Bevenius <[email protected]>

---------

Signed-off-by: Daniel Bevenius <[email protected]>