Skip to content

Releases: ngxson/llama.cpp

b4985

28 Mar 18:06
dd373dd
Compare
Choose a tag to compare
llama: fix error on bad grammar (#12628)

b4984

28 Mar 08:53
5d01670
Compare
Choose a tag to compare
server : include speculative decoding stats when timings_per_token is…

b4983

28 Mar 08:29
ef03229
Compare
Choose a tag to compare
rpc : update README for cache usage (#12620)

b4981

28 Mar 07:01
ab6ab8f
Compare
Choose a tag to compare
rpc : send hash when tensor data is above some fixed threshold (#12496)

* rpc : send hash when tensor data is above some fixed threshold

ref #10095

* rpc : put cache under $HOME/.cache/llama.cpp

* try to fix win32 build

* another try to fix win32 build

* remove llama as dependency

b4978

27 Mar 16:00
5dec47d
Compare
Choose a tag to compare
opencl: add multi and vision rope, `gelu_quick` and `im2col` (#12600)

* opencl: add `im2col`

* opencl: add `gelu_quick`

* opencl: add mrope

* opencl: add vision rope

b4977

27 Mar 11:46
f125b8d
Compare
Choose a tag to compare
llama : add PLM GGUF Conversion & Inference Support (#12457)

* add edgellm model arch[conversation feature doesn't work]

* remove output.weight layer for edgellm arch

* [Model] update the name of the model

* update the name of model arch in convert gguf

* [Model] Refarctor the model arch into llama-model

* [Bug] Fix the bug in create attn kv

* [Code] Fix editorconfig erros

* [Code] Remove Trailing whitespace

* [Code] Remove Trailing whitespace

* [Code] Change the order of model arch in list

* [Code] Fix flake8 Lint errors

* Remove trailing white space

* [Code] Remove  call in model arch

b4974

27 Mar 08:57
Compare
Choose a tag to compare
sync : ggml

ggml-ci

b4972

27 Mar 08:00
Compare
Choose a tag to compare
sync : ggml

ggml-ci

b4970

27 Mar 07:40
c7b43ab
Compare
Choose a tag to compare
llamafile : ppc64le MMA implementation for Q4_0. (#12489)

This change upstreams llamafile's cpu matrix
multiplication kernels for ppc64le ISA using MMA
builtins. This patch handles matrix multiplication
between quantised datatypes, block_q4_0 and
block_q8_0.

This change results in 5% - 50% improvement
in total speed(ie all tokens/total time), across
various batch sizes.

The patch is tested with Meta-Lllama-3-8B,
Mistral-7B, Llama-2-7B-chat-hf models on a
IBM POWER10 machine.

Signed-off-by: Amrita H S <[email protected]>

b4969

27 Mar 07:27
24feaec
Compare
Choose a tag to compare
ggml : riscv: add 128-bit RVV support (#12530)

* ggml : add 128-bit RVV support

* ggml : revert to old RVV 256+ q2_K, q3_K, q4_K, q6_K impl

* remove trailing whitespaces

* restructure vector length selection code