Releases · ngxson/llama.cpp

28 Mar 18:06

dd373dd

b4985

llama: fix error on bad grammar (#12628)

Assets 26

28 Mar 08:53

github-actions

b4984

5d01670

b4984

server : include speculative decoding stats when timings_per_token is…

Assets 26

28 Mar 08:29

github-actions

b4983

ef03229

b4983

rpc : update README for cache usage (#12620)

Assets 26

28 Mar 07:01

github-actions

b4981

ab6ab8f

b4981

rpc : send hash when tensor data is above some fixed threshold (#12496)

* rpc : send hash when tensor data is above some fixed threshold

ref #10095

* rpc : put cache under $HOME/.cache/llama.cpp

* try to fix win32 build

* another try to fix win32 build

* remove llama as dependency

Assets 26

27 Mar 16:00

github-actions

b4978

5dec47d

b4978

opencl: add multi and vision rope, `gelu_quick` and `im2col` (#12600)

* opencl: add `im2col`

* opencl: add `gelu_quick`

* opencl: add mrope

* opencl: add vision rope

Assets 26

27 Mar 11:46

github-actions

b4977

f125b8d

b4977

llama : add PLM GGUF Conversion & Inference Support (#12457)

* add edgellm model arch[conversation feature doesn't work]

* remove output.weight layer for edgellm arch

* [Model] update the name of the model

* update the name of model arch in convert gguf

* [Model] Refarctor the model arch into llama-model

* [Bug] Fix the bug in create attn kv

* [Code] Fix editorconfig erros

* [Code] Remove Trailing whitespace

* [Code] Remove Trailing whitespace

* [Code] Change the order of model arch in list

* [Code] Fix flake8 Lint errors

* Remove trailing white space

* [Code] Remove  call in model arch

Assets 26

27 Mar 08:57

github-actions

b4974

029c693

b4974

sync : ggml

ggml-ci

Assets 26

27 Mar 08:00

github-actions

b4972

df0665a

b4972

sync : ggml

ggml-ci

Assets 26

27 Mar 07:40

github-actions

b4970

c7b43ab

b4970

llamafile : ppc64le MMA implementation for Q4_0. (#12489)

This change upstreams llamafile's cpu matrix
multiplication kernels for ppc64le ISA using MMA
builtins. This patch handles matrix multiplication
between quantised datatypes, block_q4_0 and
block_q8_0.

This change results in 5% - 50% improvement
in total speed(ie all tokens/total time), across
various batch sizes.

The patch is tested with Meta-Lllama-3-8B,
Mistral-7B, Llama-2-7B-chat-hf models on a
IBM POWER10 machine.

Signed-off-by: Amrita H S <[email protected]>

Assets 26

27 Mar 07:27

github-actions

b4969

24feaec

b4969

ggml : riscv: add 128-bit RVV support (#12530)

* ggml : add 128-bit RVV support

* ggml : revert to old RVV 256+ q2_K, q3_K, q4_K, q6_K impl

* remove trailing whitespaces

* restructure vector length selection code

Assets 26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ngxson/llama.cpp

b4985

Uh oh!

b4984

Uh oh!

b4983

Uh oh!

b4981

Uh oh!

b4978

Uh oh!

b4977

Uh oh!

b4974

Uh oh!

b4972

Uh oh!

b4970

Uh oh!

b4969

Uh oh!