Releases: ngxson/llama.cpp
Releases · ngxson/llama.cpp
b4770
metal : copy kernels for quant to F32/F16 conversions (#12017) metal: use dequantize_q templates --------- Co-authored-by: Georgi Gerganov <[email protected]>
b4769
opencl: fix for small models (#11950) * opencl: fix small shape gemv, remove unused extensions * opencl: fix `transpose_16`, `dump_tensor`, enforce subgroup size * opencl: fix for token length < 4 * opencl: use wave size of 64 for all Adreno GPUs --------- Co-authored-by: Shawn Gu <[email protected]> Co-authored-by: Skyler Szot <[email protected]>
b4768
llava : Add Granite Vision Support (#11794) * Add super wip scripts for multimodal granite gguf Signed-off-by: Alex-Brooks <[email protected]> * Add example for converting mmgranite to gguf Signed-off-by: Alex-Brooks <[email protected]> * remove hardcoded path Signed-off-by: Alex-Brooks <[email protected]> * Add vision feature layer to gguf params Signed-off-by: Alex-Brooks <[email protected]> * Clean up llava surgery and remove name substitution hacks Signed-off-by: Alex-Brooks <[email protected]> * Add transformers llava next tensor name mapping Signed-off-by: Alex-Brooks <[email protected]> * Make siglip / openclip mutuall exclusive Signed-off-by: Alex-Brooks <[email protected]> * Fix projector linear substitution Signed-off-by: Alex-Brooks <[email protected]> * Fix linear 2 substitution index Signed-off-by: Alex-Brooks <[email protected]> * Increase max flattened gridpoints to 64 Signed-off-by: Alex-Brooks <[email protected]> * Fix hardcoded concat for multiple feature layers Signed-off-by: Alex-Brooks <[email protected]> * Pull vision feature layers out of gguf keys Signed-off-by: Alex-Brooks <[email protected]> * fix num gridpoints and use all layers Signed-off-by: Alex-Brooks <[email protected]> * Avoid dropping last image encoder layer in llava models Signed-off-by: Alex-Brooks <[email protected]> * Use 10 for max number of patches Signed-off-by: Alex-Brooks <[email protected]> * Standardize vision feature layers Signed-off-by: Alex-Brooks <[email protected]> * Cleanup logs Signed-off-by: Alex-Brooks <[email protected]> * Update comment for vision feature layer init Signed-off-by: Alex-Brooks <[email protected]> * Update notes for alternative to legacy llm conversion script Signed-off-by: Alex-Brooks <[email protected]> * Fix notes rendering Signed-off-by: Alex-Brooks <[email protected]> * Add v prefix to vision feature layer log Signed-off-by: Alex-Brooks <[email protected]> * Use current defaults for feature layer Signed-off-by: Alex-Brooks <[email protected]> * Use constant for max gridpoints / feat layers, style fixes Signed-off-by: Alex-Brooks <[email protected]> * clarify non-negative feature layers Signed-off-by: Alex-Brooks <[email protected]> * Remove CLIP_API from func signature Signed-off-by: Alex-Brooks <[email protected]> * USE MAX_IMAGE_FEATURE_LAYERS const in layer calc Signed-off-by: Alex-Brooks <[email protected]> * Clarify feature layers are non negative ints and not uint Signed-off-by: Alex-Brooks <[email protected]> * Fix condition for reading feature layers Signed-off-by: Alex-Brooks <[email protected]> * pop last llava layer when feature layers are unset Signed-off-by: Alex-Brooks <[email protected]> * Fix unset vision layer 0 Signed-off-by: Alex-Brooks <[email protected]> * Update examples/llava/clip.cpp Co-authored-by: Xuan-Son Nguyen <[email protected]> * Reenable assertion for out of bounds get_rows Signed-off-by: Alex-Brooks <[email protected]> * Use std vector for gridpoints and feature layers Signed-off-by: Alex-Brooks <[email protected]> * Caculate max feature layer at load time Signed-off-by: Alex-Brooks <[email protected]> * Include base patch for granite vision allocation Signed-off-by: Alex-Brooks <[email protected]> * Fix trailing whitespace Signed-off-by: Alex-Brooks <[email protected]> * Add max num patches = 10 back for minicpmv Signed-off-by: Alex-Brooks <[email protected]> * Use unordered set to store feature layers Co-authored-by: Xuan-Son Nguyen <[email protected]> Signed-off-by: Alex-Brooks <[email protected]> * Use max feature layer for postnorm Signed-off-by: Alex-Brooks <[email protected]> * Apply suggestions from code review --------- Signed-off-by: Alex-Brooks <[email protected]> Co-authored-by: Xuan-Son Nguyen <[email protected]>
b4767
[SYCL] Optimize mul_mat for Q4_0 on Intel GPU (#12035) * opt performance by reorder for Intel GPU * detect hw type and save opt feature, and print opt feature * correct name * support optimize graph once when compute graph, record the opt status in tensor->extra, make CI passed * add env variable GGML_SYCL_DISABLE_OPT for debug * use syclex::architecture replace the custom hw define, update the guide for GGML_SYCL_DISABLE_OPT * add performance data * mv getrows functions to separeted files * fix global variables --------- Co-authored-by: arthw <[email protected]>
b4754
llama.swiftui : add "Done" dismiss button to help view (#11998) The commit updates the help view in the llama.swiftui example to use a NavigationView and a Done button to dismiss the help view. The motivation for this is that without this change there is now way to dimiss the help view.
b4753
llama : skip loading unused tensors (#12004) * llama : assign unknown/unused tensors to host buffer type ggml-ci * llama : skip unused tensors ggml-ci
b4750
MUSA: support ARM64 and enable dp4a .etc (#11843) * MUSA: support ARM64 and enable __dp4a .etc * fix cross entropy loss op for musa * update * add cc info log for musa * add comment for the MUSA .cc calculation block --------- Co-authored-by: Bodhi Hu <[email protected]>
b4749
clip : fix visual encoders with no CLS (#11982) Signed-off-by: Alex-Brooks <[email protected]>
b4747
ggml-cpu: Add CPU backend support for KleidiAI library (#11390) * ggml-cpu: Add CPU backend support for KleidiAI library * Add environmental variable GGML_KLEIDIAI_SME * Add support for multithread LHS conversion * Switch kernel selection order to dotprod and i8mm * updates for review comments * More updates for review comments * Reorganize and rename KleidiAI files * Move ggml-cpu-traits.h to source file * Update cmake for SME build and add alignment for SME * Remove append GGML_USE_CPU_KLEIDIAI to the GGML_CDEF_PUBLIC list
b4746
ggml: aarch64: implement SVE kernels for q3_K_q8_K vector dot (#11917) * Added SVE Implementation for Q3_K Kernel in ggml-cpu-quants.c file * Improved Formating of code in ggml-cpu-quants.c file * style : minor fixes * style : less whitespaces * style : ptr spaceing --------- Co-authored-by: vithulep <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]>