Skip to content

Releases: ngxson/llama.cpp

b4786

28 Feb 08:04
fbeda90
Compare
Choose a tag to compare
vulkan: matmul dequantization improvements (#12015)

* faster dequant for old quants

* dont use unpack for iq4_nl

* vec2 unpack for q8

b4784

27 Feb 08:28
b95c8af
Compare
Choose a tag to compare
cmake: Fix ggml backend dependencies and installation (#11818)

* Fix dependencies between ggml and backends

ggml backends link only to ggml-base and ggml links to all backends.

* Fix installation of ggml backends

Set up GNUInstallDirs before setting the installation directory of ggml backends

b4783

26 Feb 15:03
a800ae4
Compare
Choose a tag to compare
llava : add struct for FFI bindgen (#12079)

* add struct for FFI bindgen

* Apply suggestions from code review

---------

Co-authored-by: Xuan-Son Nguyen <[email protected]>

b4778

25 Feb 16:09
a82c9e7
Compare
Choose a tag to compare
vulkan: fix assertion when qy_needs_dequant (#12068)

Looks like a copy/paste bug from qx_needs_dequant.

b4777

25 Feb 12:45
401af80
Compare
Choose a tag to compare
server: handle echo=false on /v1/completions (#12060)

b4776

25 Feb 12:20
c132239
Compare
Choose a tag to compare
add OP sigmoid (#12056)

Co-authored-by: Judd <[email protected]>

b4775

25 Feb 12:13
393fca6
Compare
Choose a tag to compare
ggml-cpu: Fix build with sve (#12059)

* ggml-cpu: Fix build with sve

Signed-off-by: Molly Sophia <[email protected]>

* ggml-cpu: Remove unused variable in sve q3_k vec dot

Signed-off-by: Molly Sophia <[email protected]>

---------

Signed-off-by: Molly Sophia <[email protected]>

b4774

25 Feb 11:53
61d4f39
Compare
Choose a tag to compare
vulkan: implement more backpropagation operators (#11914)

* vulkan: implement GGML_OP_ROPE_BACK

* vulkan: implement GGML_OP_RMS_NORM_BACK

* vulkan: implement GGML_OP_SILU_BACK

* vulkan: implement GGML_OP_SOFTMAX_BACK

b4773

25 Feb 11:21
0b52745
Compare
Choose a tag to compare
server: support add_generation_prompt query param (#12062)

b4771

25 Feb 10:16
3e9a286
Compare
Choose a tag to compare
llama : expose llama_model_n_head_kv in the API (#11997)

It's useful to be able to have this from the library layer as it's a key
parameter of the model (e.g. to figure out how much KV cache memory is
needed).