Skip to content

Releases: ngxson/llama.cpp

b6259

23 Aug 19:59
710dfc4
Compare
Choose a tag to compare
CUDA: fix half2 -> half conversion for HIP (#15529)

b6258

23 Aug 18:31
611f419
Compare
Choose a tag to compare
vulkan: optimize rms_norm, and allow the work to spread across multip…

b6257

23 Aug 13:43
b1afcab
Compare
Choose a tag to compare
model : add support for Seed-OSS (#15490)

* First draft

* Fix linter errors

* Added missing sinks nullptr

* Don't forget the llama-arch!

* We're through to the generation stage.

* Fix post-attention norm

* Apply suggestions from code review

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Fix RoPE type

* Fix tensor name and reorder llm_types

* Update gguf-py/gguf/constants.py

Remove nonexistent FFN_POST_NORM tensor

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Update src/llama-model.h

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Add basic chat template

* Add chat template tests

* Remake chat template test

* Apply suggestions from code review

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Update src/llama-chat.cpp

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Reorder llm type descriptions

* Update src/llama-model.cpp

Co-authored-by: Sigbjørn Skjæret <[email protected]>

---------

Co-authored-by: Sigbjørn Skjæret <[email protected]>

b6255

23 Aug 08:56
21dc4dd
Compare
Choose a tag to compare
chat : fix debug build assertion in trim function (#15520)

b6254

23 Aug 07:59
289bf41
Compare
Choose a tag to compare
vulkan: Rewrite synchronization to allow some overlap between nodes (…

b6252

23 Aug 06:53
0a9b43e
Compare
Choose a tag to compare
vulkan : support ggml_mean (#15393)

* vulkan : support ggml_mean

* vulkan : support sum, sum_rows and mean with non-contiguous tensors

* vulkan : fix subbuffer size not accounting for misalign offset

* tests : add backend-op tests for non-contiguous sum_rows

* cuda : require contiguous src for SUM_ROWS, MEAN support
* sycl : require contiguous src for SUM, SUM_ROWS, ARGSORT support

* require ggml_contiguous_rows in supports_op and expect nb00=1 in the shader

b6250

22 Aug 22:09
e92734d
Compare
Choose a tag to compare
test-opt: allow slight inprecision (#15503)

b6249

22 Aug 18:43
4536363
Compare
Choose a tag to compare
ggml WebGPU: add support for quantization types (#15440)

* Begin work on set_rows

* Work on set rows

* Add error buffers for reporting unsupported SET_ROWS indices

* Remove extra comments

* Work on templating for different types in shaders

* Work on shader type generation

* Working q4_0 mul_mat and some templating for different types

* Add q4_0_f16 matmul and fix device init

* Add matmul support for basic quantization types

* Add q2_k and q3_k quantization

* Add rest of k-quants

* Get firt i-quant working

* Closer to supporting all i-quants

* Support rest of i-quants

* Cleanup code

* Fix python formatting

* debug

* Bugfix for memset

* Add padding to end of buffers on creation

* Simplify bit-shifting

* Update usage of StringView

b6248

22 Aug 16:28
32732f2
Compare
Choose a tag to compare
model : gpt-oss add response_format support (#15494)

b6247

22 Aug 14:35
92f7f0a
Compare
Choose a tag to compare
ggml: add `conv3d` op (#15182)

* add conv3d

* bump GGML_OP_COUNT