Releases: ngxson/llama.cpp
Releases · ngxson/llama.cpp
b6249
ggml WebGPU: add support for quantization types (#15440) * Begin work on set_rows * Work on set rows * Add error buffers for reporting unsupported SET_ROWS indices * Remove extra comments * Work on templating for different types in shaders * Work on shader type generation * Working q4_0 mul_mat and some templating for different types * Add q4_0_f16 matmul and fix device init * Add matmul support for basic quantization types * Add q2_k and q3_k quantization * Add rest of k-quants * Get firt i-quant working * Closer to supporting all i-quants * Support rest of i-quants * Cleanup code * Fix python formatting * debug * Bugfix for memset * Add padding to end of buffers on creation * Simplify bit-shifting * Update usage of StringView
b6248
model : gpt-oss add response_format support (#15494)
b6247
ggml: add `conv3d` op (#15182) * add conv3d * bump GGML_OP_COUNT
b6246
cuda : add Pad Reflect 1D support (#14659) * Add Pad Reflect 1D CUDA support * Update ggml/src/ggml-cuda/pad_reflect_1d.cu Co-authored-by: Johannes Gäßler <[email protected]> --------- Co-authored-by: Johannes Gäßler <[email protected]>
b6244
ggml-cpu: Support Q5_0 and Q5_1 on s390x (#15486) * ggml-cpu: initial q5_0 impl for s390x Signed-off-by: Aaron Teo <[email protected]> * ggml-cpu: updated q5_0 code for better performance Signed-off-by: Aaron Teo <[email protected]> * ggml-cpu: use optimised hsum for better performance Signed-off-by: Aaron Teo <[email protected]> * ggml-cpu: introduce q5_1 simd + refactor q5_0 Signed-off-by: Aaron Teo <[email protected]> * ggml-cpu: fix incorrect return type vec_hsum Signed-off-by: Aaron Teo <[email protected]> * ggml-cpu: q5_0 incomplete refactor + table_b2b_0 activation Signed-off-by: Aaron Teo <[email protected]> * ggml-cpu: refactor q5_1 Signed-off-by: Aaron Teo <[email protected]> * ggml-cpu: q5_1 update loop unroll to 4 Signed-off-by: Aaron Teo <[email protected]> * ggml-cpu: update q5_0 unroll to 4 Signed-off-by: Aaron Teo <[email protected]> * ggml-cpu: update build-s390x docs Signed-off-by: Aaron Teo <[email protected]> * ggml-cpu: update unused variables q5_0 Signed-off-by: Aaron Teo <[email protected]> * docs: update the last update date Signed-off-by: Aaron Teo <[email protected]> --------- Signed-off-by: Aaron Teo <[email protected]>
b6242
readme : model : mtdm : lfm2 improvements (#15476) * Support untied embeddings * Increase number of image tokens to 1024 * Add LFM2-VL to readme * Actually use untied embeddings
b6241
CANN: Optimize RMS_NORM using cache (#15419) * [CANN] Optimize RMS_NORM using cache Signed-off-by: noemotiovon <[email protected]> * fix typo Signed-off-by: noemotiovon <[email protected]> * fix review comment Signed-off-by: noemotiovon <[email protected]> * codestyle adjustment Signed-off-by: noemotiovon <[email protected]> --------- Signed-off-by: noemotiovon <[email protected]>
b6240
sched : fix possible use of wrong ids tensor when offloading moe prom…
b6239
llama : remove deprecated llama_kv_self API (#15472) ggml-ci
b6237
vulkan : support conv_2d_dw with f16 weights (#15392)