Skip to content

Releases: ngxson/llama.cpp

b6249

22 Aug 18:43
4536363
Compare
Choose a tag to compare
ggml WebGPU: add support for quantization types (#15440)

* Begin work on set_rows

* Work on set rows

* Add error buffers for reporting unsupported SET_ROWS indices

* Remove extra comments

* Work on templating for different types in shaders

* Work on shader type generation

* Working q4_0 mul_mat and some templating for different types

* Add q4_0_f16 matmul and fix device init

* Add matmul support for basic quantization types

* Add q2_k and q3_k quantization

* Add rest of k-quants

* Get firt i-quant working

* Closer to supporting all i-quants

* Support rest of i-quants

* Cleanup code

* Fix python formatting

* debug

* Bugfix for memset

* Add padding to end of buffers on creation

* Simplify bit-shifting

* Update usage of StringView

b6248

22 Aug 16:28
32732f2
Compare
Choose a tag to compare
model : gpt-oss add response_format support (#15494)

b6247

22 Aug 14:35
92f7f0a
Compare
Choose a tag to compare
ggml: add `conv3d` op (#15182)

* add conv3d

* bump GGML_OP_COUNT

b6246

22 Aug 11:49
b1ab918
Compare
Choose a tag to compare
cuda : add Pad Reflect 1D support (#14659)

* Add Pad Reflect 1D CUDA support

* Update ggml/src/ggml-cuda/pad_reflect_1d.cu

Co-authored-by: Johannes Gäßler <[email protected]>

---------

Co-authored-by: Johannes Gäßler <[email protected]>

b6244

22 Aug 08:34
ad5c975
Compare
Choose a tag to compare
ggml-cpu: Support Q5_0 and Q5_1 on s390x (#15486)

* ggml-cpu: initial q5_0 impl for s390x

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: updated q5_0 code for better performance

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: use optimised hsum for better performance

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: introduce q5_1 simd + refactor q5_0

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: fix incorrect return type vec_hsum

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: q5_0 incomplete refactor + table_b2b_0 activation

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: refactor q5_1

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: q5_1 update loop unroll to 4

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: update q5_0 unroll to 4

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: update build-s390x docs

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: update unused variables q5_0

Signed-off-by: Aaron Teo <[email protected]>

* docs: update the last update date

Signed-off-by: Aaron Teo <[email protected]>

---------

Signed-off-by: Aaron Teo <[email protected]>

b6242

22 Aug 07:55
e288693
Compare
Choose a tag to compare
readme : model : mtdm : lfm2 improvements (#15476)

* Support untied embeddings

* Increase number of image tokens to 1024

* Add LFM2-VL to readme

* Actually use untied embeddings

b6241

22 Aug 06:26
a0f98dd
Compare
Choose a tag to compare
CANN: Optimize RMS_NORM using cache (#15419)

* [CANN] Optimize RMS_NORM using cache

Signed-off-by: noemotiovon <[email protected]>

* fix typo

Signed-off-by: noemotiovon <[email protected]>

* fix review comment

Signed-off-by: noemotiovon <[email protected]>

* codestyle adjustment

Signed-off-by: noemotiovon <[email protected]>

---------

Signed-off-by: noemotiovon <[email protected]>

b6240

21 Aug 21:30
54a241f
Compare
Choose a tag to compare
sched : fix possible use of wrong ids tensor when offloading moe prom…

b6239

21 Aug 16:44
cd36b5e
Compare
Choose a tag to compare
llama : remove deprecated llama_kv_self API (#15472)

ggml-ci

b6237

21 Aug 15:34
97ae596
Compare
Choose a tag to compare
vulkan : support conv_2d_dw with f16 weights (#15392)