Releases: ngxson/llama.cpp
Releases · ngxson/llama.cpp
b4786
vulkan: matmul dequantization improvements (#12015) * faster dequant for old quants * dont use unpack for iq4_nl * vec2 unpack for q8
b4784
cmake: Fix ggml backend dependencies and installation (#11818) * Fix dependencies between ggml and backends ggml backends link only to ggml-base and ggml links to all backends. * Fix installation of ggml backends Set up GNUInstallDirs before setting the installation directory of ggml backends
b4783
llava : add struct for FFI bindgen (#12079) * add struct for FFI bindgen * Apply suggestions from code review --------- Co-authored-by: Xuan-Son Nguyen <[email protected]>
b4778
vulkan: fix assertion when qy_needs_dequant (#12068) Looks like a copy/paste bug from qx_needs_dequant.
b4777
server: handle echo=false on /v1/completions (#12060)
b4776
add OP sigmoid (#12056) Co-authored-by: Judd <[email protected]>
b4775
ggml-cpu: Fix build with sve (#12059) * ggml-cpu: Fix build with sve Signed-off-by: Molly Sophia <[email protected]> * ggml-cpu: Remove unused variable in sve q3_k vec dot Signed-off-by: Molly Sophia <[email protected]> --------- Signed-off-by: Molly Sophia <[email protected]>
b4774
vulkan: implement more backpropagation operators (#11914) * vulkan: implement GGML_OP_ROPE_BACK * vulkan: implement GGML_OP_RMS_NORM_BACK * vulkan: implement GGML_OP_SILU_BACK * vulkan: implement GGML_OP_SOFTMAX_BACK
b4773
server: support add_generation_prompt query param (#12062)
b4771
llama : expose llama_model_n_head_kv in the API (#11997) It's useful to be able to have this from the library layer as it's a key parameter of the model (e.g. to figure out how much KV cache memory is needed).