Skip to content

Releases: ochafik/llama.cpp

b4508

18 Jan 18:44
a1649cc

Choose a tag to compare

Adding linenoise.cpp to llama-run (#11252)

This is a fork of linenoise that is C++17 compatible. I intend on
adding it to llama-run so we can do things like traverse prompt
history via the up and down arrows:

https://github.com/ericcurtin/linenoise.cpp

Signed-off-by: Eric Curtin <[email protected]>

b4502

18 Jan 01:24
3edfa7d

Choose a tag to compare

llama.android: add field formatChat to control whether to parse speci…

b4476

13 Jan 22:55
504af20

Choose a tag to compare

server : (UI) Improve messages bubble shape in RTL (#11220)

I simply have overlooked message bubble's tail placement for RTL
text as I use the dark mode and that isn't visible there and this
fixes it.

b4474

13 Jan 19:58
39509fb

Choose a tag to compare

cuda : CUDA Graph Compute Function Refactor (precursor for performanc…

b4397

29 Dec 19:58
a813bad

Choose a tag to compare

vulkan: im2col and matmul optimizations for stable diffusion (#10942)

* tests: Add im2col perf tests

* vulkan: optimize im2col, more elements per thread

* vulkan: increase small tile size for NV_coopmat2

* vulkan: change im2col to 512 elements per workgroup

b4393

26 Dec 20:27
d79d8f3

Choose a tag to compare

vulkan: multi-row k quants (#10846)

* multi row k quant shaders!

* better row selection

* more row choices

* readjust row selection

* rm_kq=2 by default

b4327

14 Dec 15:47
ba1cb19

Choose a tag to compare

llama : add Qwen2VL support + multimodal RoPE (#10361)

* Barebone Qwen2VL LLM convertor

* Add Qwen2VL cli entrypoint

* [WIP] add qwen2vl arch

* Verify m-rope output

* Add vl-rope/2d-rope support for qwen2vl ViT

* update qwen2vl cli tool

* update 5D tensor op workaround

* [WIP] qwen2vl vision model

* make batch and clip utils compatible with qwen2vl

* [WIP] create inference workflow, gguf convert script but fix

* correcting vision-rope behavior, add the missing last layer back to ViT

* add arg parser to qwen2vl_surgery

* replace variable size array with vector

* cuda-gdb cmake preset

* add fp32 mrope, vision rope kernel

* add fp16 support for qwen2vl and m-rope

* add `GGML_ROPE_TYPE_MROPE`, `GGML_ROPE_TYPE_VISION`

* fix rope op mode switching, out dated func args

* update `llama_hparams`

* update to keep up stream changes

* resolve linter, test errors

* add makefile entry, update speical image padding token

* add mrope unit test, fix few compiler warnings

* rename `mrope` related function, params

* minor updates on debug util, bug fixs

* add `m-rope` testcase to `test-backend-ops`

* Apply suggestions from code review

Co-authored-by: Georgi Gerganov <[email protected]>

* fix traililng whitespce

* store `llama_hparams.rope_sections` with fixed size array

* update position id tensor size check in GGML_OP_ROPE

* minor updates

* update `ggml_backend_*_supports_op` of unsupported backends

* remote old `rope_section` compare operator

---------

Co-authored-by: Georgi Gerganov <[email protected]>

b4295

10 Dec 03:04
26a8406

Choose a tag to compare

CUDA: fix shared memory access condition for mmv (#10740)

b4291

09 Dec 00:50
ce8784b

Choose a tag to compare

server : fix format_infill (#10724)

* server : fix format_infill

* fix

* rename

* update test

* use another model

* update test

* update test

* test_invalid_input_extra_req

b4274

06 Dec 00:41
7736837

Choose a tag to compare

fix(server) : not show alert when DONE is received (#10674)