Releases: ngxson/llama.cpp
Releases · ngxson/llama.cpp
b4985
llama: fix error on bad grammar (#12628)
b4984
server : include speculative decoding stats when timings_per_token is…
b4983
rpc : update README for cache usage (#12620)
b4981
rpc : send hash when tensor data is above some fixed threshold (#12496) * rpc : send hash when tensor data is above some fixed threshold ref #10095 * rpc : put cache under $HOME/.cache/llama.cpp * try to fix win32 build * another try to fix win32 build * remove llama as dependency
b4978
opencl: add multi and vision rope, `gelu_quick` and `im2col` (#12600) * opencl: add `im2col` * opencl: add `gelu_quick` * opencl: add mrope * opencl: add vision rope
b4977
llama : add PLM GGUF Conversion & Inference Support (#12457) * add edgellm model arch[conversation feature doesn't work] * remove output.weight layer for edgellm arch * [Model] update the name of the model * update the name of model arch in convert gguf * [Model] Refarctor the model arch into llama-model * [Bug] Fix the bug in create attn kv * [Code] Fix editorconfig erros * [Code] Remove Trailing whitespace * [Code] Remove Trailing whitespace * [Code] Change the order of model arch in list * [Code] Fix flake8 Lint errors * Remove trailing white space * [Code] Remove call in model arch
b4974
sync : ggml ggml-ci
b4972
sync : ggml ggml-ci
b4970
llamafile : ppc64le MMA implementation for Q4_0. (#12489) This change upstreams llamafile's cpu matrix multiplication kernels for ppc64le ISA using MMA builtins. This patch handles matrix multiplication between quantised datatypes, block_q4_0 and block_q8_0. This change results in 5% - 50% improvement in total speed(ie all tokens/total time), across various batch sizes. The patch is tested with Meta-Lllama-3-8B, Mistral-7B, Llama-2-7B-chat-hf models on a IBM POWER10 machine. Signed-off-by: Amrita H S <[email protected]>
b4969
ggml : riscv: add 128-bit RVV support (#12530) * ggml : add 128-bit RVV support * ggml : revert to old RVV 256+ q2_K, q3_K, q4_K, q6_K impl * remove trailing whitespaces * restructure vector length selection code