Releases · ngxson/llama.cpp

23 Aug 19:59

710dfc4

b6259

CUDA: fix half2 -> half conversion for HIP (#15529)

Assets 15

23 Aug 18:31

github-actions

b6258

611f419

b6258

vulkan: optimize rms_norm, and allow the work to spread across multip…

Assets 15

23 Aug 13:43

github-actions

b6257

b1afcab

b6257

model : add support for Seed-OSS (#15490)

* First draft

* Fix linter errors

* Added missing sinks nullptr

* Don't forget the llama-arch!

* We're through to the generation stage.

* Fix post-attention norm

* Apply suggestions from code review

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Fix RoPE type

* Fix tensor name and reorder llm_types

* Update gguf-py/gguf/constants.py

Remove nonexistent FFN_POST_NORM tensor

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Update src/llama-model.h

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Add basic chat template

* Add chat template tests

* Remake chat template test

* Apply suggestions from code review

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Update src/llama-chat.cpp

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Reorder llm type descriptions

* Update src/llama-model.cpp

Co-authored-by: Sigbjørn Skjæret <[email protected]>

---------

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Assets 15

23 Aug 08:56

github-actions

b6255

21dc4dd

b6255

chat : fix debug build assertion in trim function (#15520)

Assets 15

23 Aug 07:59

github-actions

b6254

289bf41

b6254

vulkan: Rewrite synchronization to allow some overlap between nodes (…

Assets 15

23 Aug 06:53

github-actions

b6252

0a9b43e

b6252

vulkan : support ggml_mean (#15393)

* vulkan : support ggml_mean

* vulkan : support sum, sum_rows and mean with non-contiguous tensors

* vulkan : fix subbuffer size not accounting for misalign offset

* tests : add backend-op tests for non-contiguous sum_rows

* cuda : require contiguous src for SUM_ROWS, MEAN support
* sycl : require contiguous src for SUM, SUM_ROWS, ARGSORT support

* require ggml_contiguous_rows in supports_op and expect nb00=1 in the shader

Assets 15

22 Aug 22:09

github-actions

b6250

e92734d

b6250

test-opt: allow slight inprecision (#15503)

Assets 15

22 Aug 18:43

github-actions

b6249

4536363

b6249

ggml WebGPU: add support for quantization types (#15440)

* Begin work on set_rows

* Work on set rows

* Add error buffers for reporting unsupported SET_ROWS indices

* Remove extra comments

* Work on templating for different types in shaders

* Work on shader type generation

* Working q4_0 mul_mat and some templating for different types

* Add q4_0_f16 matmul and fix device init

* Add matmul support for basic quantization types

* Add q2_k and q3_k quantization

* Add rest of k-quants

* Get firt i-quant working

* Closer to supporting all i-quants

* Support rest of i-quants

* Cleanup code

* Fix python formatting

* debug

* Bugfix for memset

* Add padding to end of buffers on creation

* Simplify bit-shifting

* Update usage of StringView

Assets 15

22 Aug 16:28

github-actions

b6248

32732f2

b6248

model : gpt-oss add response_format support (#15494)

Assets 15

22 Aug 14:35

github-actions

b6247

92f7f0a

b6247

ggml: add `conv3d` op (#15182)

* add conv3d

* bump GGML_OP_COUNT

Assets 15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ngxson/llama.cpp

b6259

Uh oh!

b6258

Uh oh!

b6257

Uh oh!

b6255

Uh oh!

b6254

Uh oh!

b6252

Uh oh!

b6250

Uh oh!

b6249

Uh oh!

b6248

Uh oh!

b6247

Uh oh!