Releases · ngxson/llama.cpp

28 Aug 08:12

d35a1e8

b6305

cli : change log to warning to explain reason for stopping (#15604)

* Change to warn instead of debug, to explain reason for stopping.

* Update tools/main/main.cpp

Fix printing --2

Co-authored-by: Georgi Gerganov <[email protected]>

---------

Co-authored-by: Georgi Gerganov <[email protected]>

Assets 15

28 Aug 01:19

github-actions

b6303

5a0e3ef

b6303

cuda: Add cublasLt_static linking when GGML_STATIC is enabled (#15622)

Prior to this change, we faced undefined cublasLt references when
attempting to compile 'llama-cli' with GGML_STATIC=ON on Linux.

We add linking with CUDA::cublasLt_static when CUDA version is greater
than 10.1.

Assets 15

27 Aug 11:16

github-actions

b6299

1bded5a

b6299

kv-cache : better estimate of n_kv for multi-sequence batches (#15610)

ggml-ci

Assets 15

27 Aug 09:41

github-actions

b6298

1e74897

b6298

CANN: refactor mask handling and improve performance in FA (#15561)

* CANN(flash-attn): refactor mask handling and improve performance

1. Refactored the mask computation in Flash Attention, unified the logic without separating prefill and decode.
2. Optimized performance in non-alibi scenarios by reducing one repeat operation.
3. Updated operator management to explicitly mark unsupported cases on 310P devices and when dim is not divisible by 16.

Signed-off-by: noemotiovon <[email protected]>

* [CANN]: fix review

Signed-off-by: noemotiovon <[email protected]>

* [CANN]: Optimization FA BNSD to BSND

Signed-off-by: noemotiovon <[email protected]>

---------

Signed-off-by: noemotiovon <[email protected]>

Assets 15

27 Aug 09:06

github-actions

b6297

1cf123a

b6297

ggml-cpu : add basic RVV support for vector f32 ops (#15057)

* ggml-cpu : add basic RVV support for vector f32 ops

* ggml-cpu : add RVV support for f32 softmax

Assets 15

27 Aug 07:03

github-actions

b6295

86076f9

b6295

OpenCL: add fused group_norm/norm, mul, add (#15314)

* add fused group_norm/norm, mul, add

* fix spacing

* revert rms_norm logic

* fix trailing whitespace

Assets 15

26 Aug 19:21

github-actions

b6293

8b69686

b6293

SYCL: fix rms_norm_mul_add for tensor dim not a multiple of sg_size (…

Assets 15

26 Aug 16:08

github-actions

b6291

44b1efa

b6291

tests: add performance test for mul mat id (#15543)

Assets 15

26 Aug 16:05

github-actions

b6290

a6a58d6

b6290

llamafile: PowerPC Sgemm Optimization (#15558)

This patch improves GEMM for FP32 Data Type on PowerPC

Implements GEMM on large blocks with configurable block size mc, nc, kc
(default: 256, 256, 256).
Packing Function optimized to access blocks as per memory layout.
GEMM Optimized to work on larger blocks.
Isolated Packing from GEMM Operations for better MMA utilization.

Verified functionality and correctness uing llama-cli and stand alone
test case (performs matmul and compares final mattrix C result with base).

Minor code refactoring changes:
Replace macro with inline function
Code Indent made consistent with 4 spaces

Performance Testing:

Observed 50% ~ 70% improvement in Prompt Processing Speed mesured using
llama-bench with Meta-Llama3-8B FP32 Model.  Similar gains observed with
Mistral-7b-Instruct-v0.3 Model.

model                   Size                Params     Backend       Threads   Test    Patch   Base
llama 8B all F32        29.92 GiB           8.03 B      CPU           20       pp512   98.58   60.3
llama 8B all F32        29.92 GiB           8.03 B      CPU           20       pp1024  95.88   57.36
llama 8B all F32        29.92 GiB           8.03 B      CPU           20       pp2048  85.46   53.26
llama 8B all F32        29.92 GiB           8.03 B      CPU           20       pp4096  68.66   45.78
llama 8B all F32        29.92 GiB           8.03 B      CPU           20       pp6144  57.35   40.44

25 ~ 30% improvement in llama-batched-bench with Metla-Llama3-8B in
Prompt Processing Speed for large prompts (256, 512, 1024, 2048, 4096)tokens with various batch
sizes ( 1, 2, 4, 8, 16)

Signed-off-by: Shalini Salomi Bodapati <[email protected]>

Assets 15

26 Aug 15:46

github-actions

b6289

0373486

b6289

graph : fix assert in memory-less build_attn (#15590)

ggml-ci

Assets 15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ngxson/llama.cpp

b6305

Uh oh!

b6303

Uh oh!

b6299

Uh oh!

b6298

Uh oh!

b6297

Uh oh!

b6295

Uh oh!

b6293

Uh oh!

b6291

Uh oh!

b6290

Uh oh!

b6289

Uh oh!