Releases · ngxson/llama.cpp

25 Feb 10:10

58d07a8

b4770

metal : copy kernels for quant to F32/F16 conversions (#12017)

metal: use dequantize_q templates

---------

Co-authored-by: Georgi Gerganov <[email protected]>

Assets 25

24 Feb 22:28

github-actions

b4769

34a846b

b4769

opencl: fix for small models (#11950)

* opencl: fix small shape gemv, remove unused extensions

* opencl: fix `transpose_16`, `dump_tensor`, enforce subgroup size

* opencl: fix for token length < 4

* opencl: use wave size of 64 for all Adreno GPUs

---------

Co-authored-by: Shawn Gu <[email protected]>
Co-authored-by: Skyler Szot <[email protected]>

Assets 25

24 Feb 17:10

github-actions

b4768

7a2c913

b4768

llava : Add Granite Vision Support (#11794)

* Add super wip scripts for multimodal granite gguf

Signed-off-by: Alex-Brooks <[email protected]>

* Add example for converting mmgranite to gguf

Signed-off-by: Alex-Brooks <[email protected]>

* remove hardcoded path

Signed-off-by: Alex-Brooks <[email protected]>

* Add vision feature layer to gguf params

Signed-off-by: Alex-Brooks <[email protected]>

* Clean up llava surgery and remove name substitution hacks

Signed-off-by: Alex-Brooks <[email protected]>

* Add transformers llava next tensor name mapping

Signed-off-by: Alex-Brooks <[email protected]>

* Make siglip / openclip mutuall exclusive

Signed-off-by: Alex-Brooks <[email protected]>

* Fix projector linear substitution

Signed-off-by: Alex-Brooks <[email protected]>

* Fix linear 2 substitution index

Signed-off-by: Alex-Brooks <[email protected]>

* Increase max flattened gridpoints to 64

Signed-off-by: Alex-Brooks <[email protected]>

* Fix hardcoded concat for multiple feature layers

Signed-off-by: Alex-Brooks <[email protected]>

* Pull vision feature layers out of gguf keys

Signed-off-by: Alex-Brooks <[email protected]>

* fix num gridpoints and use all layers

Signed-off-by: Alex-Brooks <[email protected]>

* Avoid dropping last image encoder layer in llava models

Signed-off-by: Alex-Brooks <[email protected]>

* Use 10 for max number of patches

Signed-off-by: Alex-Brooks <[email protected]>

* Standardize vision feature layers

Signed-off-by: Alex-Brooks <[email protected]>

* Cleanup logs

Signed-off-by: Alex-Brooks <[email protected]>

* Update comment for vision feature layer init

Signed-off-by: Alex-Brooks <[email protected]>

* Update notes for alternative to legacy llm conversion script

Signed-off-by: Alex-Brooks <[email protected]>

* Fix notes rendering

Signed-off-by: Alex-Brooks <[email protected]>

* Add v prefix to vision feature layer log

Signed-off-by: Alex-Brooks <[email protected]>

* Use current defaults for feature layer

Signed-off-by: Alex-Brooks <[email protected]>

* Use constant for max gridpoints / feat layers, style fixes

Signed-off-by: Alex-Brooks <[email protected]>

* clarify non-negative feature layers

Signed-off-by: Alex-Brooks <[email protected]>

* Remove CLIP_API from func signature

Signed-off-by: Alex-Brooks <[email protected]>

* USE MAX_IMAGE_FEATURE_LAYERS const in layer calc

Signed-off-by: Alex-Brooks <[email protected]>

* Clarify feature layers are non negative ints and not uint

Signed-off-by: Alex-Brooks <[email protected]>

* Fix condition for reading feature layers

Signed-off-by: Alex-Brooks <[email protected]>

* pop last llava layer when feature layers are unset

Signed-off-by: Alex-Brooks <[email protected]>

* Fix unset vision layer 0

Signed-off-by: Alex-Brooks <[email protected]>

* Update examples/llava/clip.cpp

Co-authored-by: Xuan-Son Nguyen <[email protected]>

* Reenable assertion for out of bounds get_rows

Signed-off-by: Alex-Brooks <[email protected]>

* Use std vector for gridpoints and feature layers

Signed-off-by: Alex-Brooks <[email protected]>

* Caculate max feature layer at load time

Signed-off-by: Alex-Brooks <[email protected]>

* Include base patch for granite vision allocation

Signed-off-by: Alex-Brooks <[email protected]>

* Fix trailing whitespace

Signed-off-by: Alex-Brooks <[email protected]>

* Add max num patches = 10 back for minicpmv

Signed-off-by: Alex-Brooks <[email protected]>

* Use unordered set to store feature layers

Co-authored-by: Xuan-Son Nguyen <[email protected]>
Signed-off-by: Alex-Brooks <[email protected]>

* Use max feature layer for postnorm

Signed-off-by: Alex-Brooks <[email protected]>

* Apply suggestions from code review

---------

Signed-off-by: Alex-Brooks <[email protected]>
Co-authored-by: Xuan-Son Nguyen <[email protected]>

Assets 25

24 Feb 17:02

github-actions

b4767

08d5986

b4767

[SYCL] Optimize mul_mat for Q4_0 on Intel GPU (#12035)

* opt performance by reorder for Intel GPU

* detect hw type and save opt feature, and print opt feature

* correct name

* support optimize graph once when compute graph, record the opt status in tensor->extra, make CI passed

* add env variable GGML_SYCL_DISABLE_OPT for debug

* use syclex::architecture replace the custom hw define, update the guide for GGML_SYCL_DISABLE_OPT

* add performance data

* mv getrows functions to separeted files

* fix global variables

---------

Co-authored-by: arthw <[email protected]>

Assets 25

22 Feb 06:18

github-actions

b4754

de8b5a3

b4754

llama.swiftui : add "Done" dismiss button to help view (#11998)

The commit updates the help view in the llama.swiftui example to use a
NavigationView and a Done button to dismiss the help view.

The motivation for this is that without this change there is now way to
dimiss the help view.

Assets 24

21 Feb 17:11

github-actions

b4753

51f311e

b4753

llama : skip loading unused tensors (#12004)

* llama : assign unknown/unused tensors to host buffer type

ggml-ci

* llama : skip unused tensors

ggml-ci

Assets 24

21 Feb 08:25

github-actions

b4750

0b3863f

b4750

MUSA: support ARM64 and enable dp4a .etc (#11843)

* MUSA:  support ARM64 and enable __dp4a .etc

* fix cross entropy loss op for musa

* update

* add cc info log for musa

* add comment for the MUSA .cc calculation block

---------

Co-authored-by: Bodhi Hu <[email protected]>

Assets 24

21 Feb 06:48

github-actions

b4749

ee02ad0

b4749

clip : fix visual encoders with no CLS (#11982)

Signed-off-by: Alex-Brooks <[email protected]>

Assets 24

20 Feb 13:46

github-actions

b4747

c5d91a7

b4747

ggml-cpu: Add CPU backend support for KleidiAI library (#11390)

* ggml-cpu: Add CPU backend support for KleidiAI library

* Add environmental variable GGML_KLEIDIAI_SME

* Add support for multithread LHS conversion

* Switch kernel selection order to dotprod and i8mm

* updates for review comments

* More updates for review comments

* Reorganize and rename KleidiAI files

* Move ggml-cpu-traits.h to source file

* Update cmake for SME build and add alignment for SME

* Remove append GGML_USE_CPU_KLEIDIAI to the GGML_CDEF_PUBLIC list

Assets 24

20 Feb 10:47

github-actions

b4746

4806498

b4746

ggml: aarch64: implement SVE kernels for q3_K_q8_K vector dot (#11917)

* Added SVE Implementation for Q3_K Kernel in ggml-cpu-quants.c file

* Improved Formating of code in  ggml-cpu-quants.c file

* style : minor fixes

* style : less whitespaces

* style : ptr spaceing

---------

Co-authored-by: vithulep <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>

Assets 24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ngxson/llama.cpp

b4770

Uh oh!

b4769

Uh oh!

b4768

Uh oh!

b4767

Uh oh!

b4754

Uh oh!

b4753

Uh oh!

b4750

Uh oh!

b4749

Uh oh!

b4747

Uh oh!

b4746

Uh oh!