Releases · ochafik/llama.cpp

18 Jan 18:44

a1649cc

b4508

Adding linenoise.cpp to llama-run (#11252)

This is a fork of linenoise that is C++17 compatible. I intend on
adding it to llama-run so we can do things like traverse prompt
history via the up and down arrows:

https://github.com/ericcurtin/linenoise.cpp

Signed-off-by: Eric Curtin <[email protected]>

Assets 23

18 Jan 01:24

github-actions

b4502

3edfa7d

b4502

llama.android: add field formatChat to control whether to parse speci…

Assets 23

13 Jan 22:55

github-actions

b4476

504af20

b4476

server : (UI) Improve messages bubble shape in RTL (#11220)

I simply have overlooked message bubble's tail placement for RTL
text as I use the dark mode and that isn't visible there and this
fixes it.

Assets 23

13 Jan 19:58

github-actions

b4474

39509fb

b4474

cuda : CUDA Graph Compute Function Refactor (precursor for performanc…

Assets 23

29 Dec 19:58

github-actions

b4397

a813bad

b4397

vulkan: im2col and matmul optimizations for stable diffusion (#10942)

* tests: Add im2col perf tests

* vulkan: optimize im2col, more elements per thread

* vulkan: increase small tile size for NV_coopmat2

* vulkan: change im2col to 512 elements per workgroup

Assets 23

26 Dec 20:27

github-actions

b4393

d79d8f3

b4393

vulkan: multi-row k quants (#10846)

* multi row k quant shaders!

* better row selection

* more row choices

* readjust row selection

* rm_kq=2 by default

Assets 23

14 Dec 15:47

github-actions

b4327

ba1cb19

b4327

llama : add Qwen2VL support + multimodal RoPE (#10361)

* Barebone Qwen2VL LLM convertor

* Add Qwen2VL cli entrypoint

* [WIP] add qwen2vl arch

* Verify m-rope output

* Add vl-rope/2d-rope support for qwen2vl ViT

* update qwen2vl cli tool

* update 5D tensor op workaround

* [WIP] qwen2vl vision model

* make batch and clip utils compatible with qwen2vl

* [WIP] create inference workflow, gguf convert script but fix

* correcting vision-rope behavior, add the missing last layer back to ViT

* add arg parser to qwen2vl_surgery

* replace variable size array with vector

* cuda-gdb cmake preset

* add fp32 mrope, vision rope kernel

* add fp16 support for qwen2vl and m-rope

* add `GGML_ROPE_TYPE_MROPE`, `GGML_ROPE_TYPE_VISION`

* fix rope op mode switching, out dated func args

* update `llama_hparams`

* update to keep up stream changes

* resolve linter, test errors

* add makefile entry, update speical image padding token

* add mrope unit test, fix few compiler warnings

* rename `mrope` related function, params

* minor updates on debug util, bug fixs

* add `m-rope` testcase to `test-backend-ops`

* Apply suggestions from code review

Co-authored-by: Georgi Gerganov <[email protected]>

* fix traililng whitespce

* store `llama_hparams.rope_sections` with fixed size array

* update position id tensor size check in GGML_OP_ROPE

* minor updates

* update `ggml_backend_*_supports_op` of unsupported backends

* remote old `rope_section` compare operator

---------

Co-authored-by: Georgi Gerganov <[email protected]>

Assets 23

10 Dec 03:04

github-actions

b4295

26a8406

b4295

CUDA: fix shared memory access condition for mmv (#10740)

Assets 22

09 Dec 00:50

github-actions

b4291

ce8784b

b4291

server : fix format_infill (#10724)

* server : fix format_infill

* fix

* rename

* update test

* use another model

* update test

* update test

* test_invalid_input_extra_req

Assets 22

06 Dec 00:41

github-actions

b4274

7736837

b4274

fix(server) : not show alert when DONE is received (#10674)

Assets 22

Releases: ochafik/llama.cpp

b4508

Uh oh!

b4502

Uh oh!

b4476

Uh oh!

b4474

Uh oh!

b4397

Uh oh!

b4393

Uh oh!

b4327

Uh oh!

b4295

Uh oh!

b4291

Uh oh!

b4274

Uh oh!