Skip to content

Releases: ngxson/llama.cpp

b4770

25 Feb 10:10
58d07a8
Compare
Choose a tag to compare
metal : copy kernels for quant to F32/F16 conversions (#12017)

metal: use dequantize_q templates

---------

Co-authored-by: Georgi Gerganov <[email protected]>

b4769

24 Feb 22:28
34a846b
Compare
Choose a tag to compare
opencl: fix for small models (#11950)

* opencl: fix small shape gemv, remove unused extensions

* opencl: fix `transpose_16`, `dump_tensor`, enforce subgroup size

* opencl: fix for token length < 4

* opencl: use wave size of 64 for all Adreno GPUs

---------

Co-authored-by: Shawn Gu <[email protected]>
Co-authored-by: Skyler Szot <[email protected]>

b4768

24 Feb 17:10
7a2c913
Compare
Choose a tag to compare
llava : Add Granite Vision Support (#11794)

* Add super wip scripts for multimodal granite gguf

Signed-off-by: Alex-Brooks <[email protected]>

* Add example for converting mmgranite to gguf

Signed-off-by: Alex-Brooks <[email protected]>

* remove hardcoded path

Signed-off-by: Alex-Brooks <[email protected]>

* Add vision feature layer to gguf params

Signed-off-by: Alex-Brooks <[email protected]>

* Clean up llava surgery and remove name substitution hacks

Signed-off-by: Alex-Brooks <[email protected]>

* Add transformers llava next tensor name mapping

Signed-off-by: Alex-Brooks <[email protected]>

* Make siglip / openclip mutuall exclusive

Signed-off-by: Alex-Brooks <[email protected]>

* Fix projector linear substitution

Signed-off-by: Alex-Brooks <[email protected]>

* Fix linear 2 substitution index

Signed-off-by: Alex-Brooks <[email protected]>

* Increase max flattened gridpoints to 64

Signed-off-by: Alex-Brooks <[email protected]>

* Fix hardcoded concat for multiple feature layers

Signed-off-by: Alex-Brooks <[email protected]>

* Pull vision feature layers out of gguf keys

Signed-off-by: Alex-Brooks <[email protected]>

* fix num gridpoints and use all layers

Signed-off-by: Alex-Brooks <[email protected]>

* Avoid dropping last image encoder layer in llava models

Signed-off-by: Alex-Brooks <[email protected]>

* Use 10 for max number of patches

Signed-off-by: Alex-Brooks <[email protected]>

* Standardize vision feature layers

Signed-off-by: Alex-Brooks <[email protected]>

* Cleanup logs

Signed-off-by: Alex-Brooks <[email protected]>

* Update comment for vision feature layer init

Signed-off-by: Alex-Brooks <[email protected]>

* Update notes for alternative to legacy llm conversion script

Signed-off-by: Alex-Brooks <[email protected]>

* Fix notes rendering

Signed-off-by: Alex-Brooks <[email protected]>

* Add v prefix to vision feature layer log

Signed-off-by: Alex-Brooks <[email protected]>

* Use current defaults for feature layer

Signed-off-by: Alex-Brooks <[email protected]>

* Use constant for max gridpoints / feat layers, style fixes

Signed-off-by: Alex-Brooks <[email protected]>

* clarify non-negative feature layers

Signed-off-by: Alex-Brooks <[email protected]>

* Remove CLIP_API from func signature

Signed-off-by: Alex-Brooks <[email protected]>

* USE MAX_IMAGE_FEATURE_LAYERS const in layer calc

Signed-off-by: Alex-Brooks <[email protected]>

* Clarify feature layers are non negative ints and not uint

Signed-off-by: Alex-Brooks <[email protected]>

* Fix condition for reading feature layers

Signed-off-by: Alex-Brooks <[email protected]>

* pop last llava layer when feature layers are unset

Signed-off-by: Alex-Brooks <[email protected]>

* Fix unset vision layer 0

Signed-off-by: Alex-Brooks <[email protected]>

* Update examples/llava/clip.cpp

Co-authored-by: Xuan-Son Nguyen <[email protected]>

* Reenable assertion for out of bounds get_rows

Signed-off-by: Alex-Brooks <[email protected]>

* Use std vector for gridpoints and feature layers

Signed-off-by: Alex-Brooks <[email protected]>

* Caculate max feature layer at load time

Signed-off-by: Alex-Brooks <[email protected]>

* Include base patch for granite vision allocation

Signed-off-by: Alex-Brooks <[email protected]>

* Fix trailing whitespace

Signed-off-by: Alex-Brooks <[email protected]>

* Add max num patches = 10 back for minicpmv

Signed-off-by: Alex-Brooks <[email protected]>

* Use unordered set to store feature layers

Co-authored-by: Xuan-Son Nguyen <[email protected]>
Signed-off-by: Alex-Brooks <[email protected]>

* Use max feature layer for postnorm

Signed-off-by: Alex-Brooks <[email protected]>

* Apply suggestions from code review

---------

Signed-off-by: Alex-Brooks <[email protected]>
Co-authored-by: Xuan-Son Nguyen <[email protected]>

b4767

24 Feb 17:02
08d5986
Compare
Choose a tag to compare
[SYCL] Optimize mul_mat for Q4_0 on Intel GPU (#12035)

* opt performance by reorder for Intel GPU

* detect hw type and save opt feature, and print opt feature

* correct name

* support optimize graph once when compute graph, record the opt status in tensor->extra, make CI passed

* add env variable GGML_SYCL_DISABLE_OPT for debug

* use syclex::architecture replace the custom hw define, update the guide for GGML_SYCL_DISABLE_OPT

* add performance data

* mv getrows functions to separeted files

* fix global variables

---------

Co-authored-by: arthw <[email protected]>

b4754

22 Feb 06:18
de8b5a3
Compare
Choose a tag to compare
llama.swiftui : add "Done" dismiss button to help view (#11998)

The commit updates the help view in the llama.swiftui example to use a
NavigationView and a Done button to dismiss the help view.

The motivation for this is that without this change there is now way to
dimiss the help view.

b4753

21 Feb 17:11
51f311e
Compare
Choose a tag to compare
llama : skip loading unused tensors (#12004)

* llama : assign unknown/unused tensors to host buffer type

ggml-ci

* llama : skip unused tensors

ggml-ci

b4750

21 Feb 08:25
0b3863f
Compare
Choose a tag to compare
MUSA: support ARM64 and enable dp4a .etc (#11843)

* MUSA:  support ARM64 and enable __dp4a .etc

* fix cross entropy loss op for musa

* update

* add cc info log for musa

* add comment for the MUSA .cc calculation block

---------

Co-authored-by: Bodhi Hu <[email protected]>

b4749

21 Feb 06:48
ee02ad0
Compare
Choose a tag to compare
clip : fix visual encoders with no CLS (#11982)

Signed-off-by: Alex-Brooks <[email protected]>

b4747

20 Feb 13:46
c5d91a7
Compare
Choose a tag to compare
ggml-cpu: Add CPU backend support for KleidiAI library (#11390)

* ggml-cpu: Add CPU backend support for KleidiAI library

* Add environmental variable GGML_KLEIDIAI_SME

* Add support for multithread LHS conversion

* Switch kernel selection order to dotprod and i8mm

* updates for review comments

* More updates for review comments

* Reorganize and rename KleidiAI files

* Move ggml-cpu-traits.h to source file

* Update cmake for SME build and add alignment for SME

* Remove append GGML_USE_CPU_KLEIDIAI to the GGML_CDEF_PUBLIC list

b4746

20 Feb 10:47
4806498
Compare
Choose a tag to compare
ggml: aarch64: implement SVE kernels for q3_K_q8_K vector dot (#11917)

* Added SVE Implementation for Q3_K Kernel in ggml-cpu-quants.c file

* Improved Formating of code in  ggml-cpu-quants.c file

* style : minor fixes

* style : less whitespaces

* style : ptr spaceing

---------

Co-authored-by: vithulep <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>