Releases · ngxson/llama.cpp

22 Aug 18:43

4536363

b6249

ggml WebGPU: add support for quantization types (#15440)

* Begin work on set_rows

* Work on set rows

* Add error buffers for reporting unsupported SET_ROWS indices

* Remove extra comments

* Work on templating for different types in shaders

* Work on shader type generation

* Working q4_0 mul_mat and some templating for different types

* Add q4_0_f16 matmul and fix device init

* Add matmul support for basic quantization types

* Add q2_k and q3_k quantization

* Add rest of k-quants

* Get firt i-quant working

* Closer to supporting all i-quants

* Support rest of i-quants

* Cleanup code

* Fix python formatting

* debug

* Bugfix for memset

* Add padding to end of buffers on creation

* Simplify bit-shifting

* Update usage of StringView

Assets 15

22 Aug 16:28

github-actions

b6248

32732f2

b6248

model : gpt-oss add response_format support (#15494)

Assets 15

22 Aug 14:35

github-actions

b6247

92f7f0a

b6247

ggml: add `conv3d` op (#15182)

* add conv3d

* bump GGML_OP_COUNT

Assets 15

22 Aug 11:49

github-actions

b6246

b1ab918

b6246

cuda : add Pad Reflect 1D support (#14659)

* Add Pad Reflect 1D CUDA support

* Update ggml/src/ggml-cuda/pad_reflect_1d.cu

Co-authored-by: Johannes Gäßler <[email protected]>

---------

Co-authored-by: Johannes Gäßler <[email protected]>

Assets 15

22 Aug 08:34

github-actions

b6244

ad5c975

b6244

ggml-cpu: Support Q5_0 and Q5_1 on s390x (#15486)

* ggml-cpu: initial q5_0 impl for s390x

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: updated q5_0 code for better performance

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: use optimised hsum for better performance

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: introduce q5_1 simd + refactor q5_0

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: fix incorrect return type vec_hsum

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: q5_0 incomplete refactor + table_b2b_0 activation

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: refactor q5_1

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: q5_1 update loop unroll to 4

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: update q5_0 unroll to 4

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: update build-s390x docs

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: update unused variables q5_0

Signed-off-by: Aaron Teo <[email protected]>

* docs: update the last update date

Signed-off-by: Aaron Teo <[email protected]>

---------

Signed-off-by: Aaron Teo <[email protected]>

Assets 15

22 Aug 07:55

github-actions

b6242

e288693

b6242

readme : model : mtdm : lfm2 improvements (#15476)

* Support untied embeddings

* Increase number of image tokens to 1024

* Add LFM2-VL to readme

* Actually use untied embeddings

Assets 15

22 Aug 06:26

github-actions

b6241

a0f98dd

b6241

CANN: Optimize RMS_NORM using cache (#15419)

* [CANN] Optimize RMS_NORM using cache

Signed-off-by: noemotiovon <[email protected]>

* fix typo

Signed-off-by: noemotiovon <[email protected]>

* fix review comment

Signed-off-by: noemotiovon <[email protected]>

* codestyle adjustment

Signed-off-by: noemotiovon <[email protected]>

---------

Signed-off-by: noemotiovon <[email protected]>

Assets 15

21 Aug 21:30

github-actions

b6240

54a241f

b6240

sched : fix possible use of wrong ids tensor when offloading moe prom…

Assets 15

21 Aug 16:44

github-actions

b6239

cd36b5e

b6239

llama : remove deprecated llama_kv_self API (#15472)

ggml-ci

Assets 15

21 Aug 15:34

github-actions

b6237

97ae596

b6237

vulkan : support conv_2d_dw with f16 weights (#15392)

Assets 15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ngxson/llama.cpp

b6249

Uh oh!

b6248

Uh oh!

b6247

Uh oh!

b6246

Uh oh!

b6244

Uh oh!

b6242

Uh oh!

b6241

Uh oh!

b6240

Uh oh!

b6239

Uh oh!

b6237

Uh oh!