Skip to content

Releases: ngxson/llama.cpp

b6305

28 Aug 08:12
d35a1e8
Compare
Choose a tag to compare
cli : change log to warning to explain reason for stopping (#15604)

* Change to warn instead of debug, to explain reason for stopping.

* Update tools/main/main.cpp

Fix printing --2

Co-authored-by: Georgi Gerganov <[email protected]>

---------

Co-authored-by: Georgi Gerganov <[email protected]>

b6303

28 Aug 01:19
5a0e3ef
Compare
Choose a tag to compare
cuda: Add cublasLt_static linking when GGML_STATIC is enabled (#15622)

Prior to this change, we faced undefined cublasLt references when
attempting to compile 'llama-cli' with GGML_STATIC=ON on Linux.

We add linking with CUDA::cublasLt_static when CUDA version is greater
than 10.1.

b6299

27 Aug 11:16
1bded5a
Compare
Choose a tag to compare
kv-cache : better estimate of n_kv for multi-sequence batches (#15610)

ggml-ci

b6298

27 Aug 09:41
1e74897
Compare
Choose a tag to compare
CANN: refactor mask handling and improve performance in FA (#15561)

* CANN(flash-attn): refactor mask handling and improve performance

1. Refactored the mask computation in Flash Attention, unified the logic without separating prefill and decode.
2. Optimized performance in non-alibi scenarios by reducing one repeat operation.
3. Updated operator management to explicitly mark unsupported cases on 310P devices and when dim is not divisible by 16.

Signed-off-by: noemotiovon <[email protected]>

* [CANN]: fix review

Signed-off-by: noemotiovon <[email protected]>

* [CANN]: Optimization FA BNSD to BSND

Signed-off-by: noemotiovon <[email protected]>

---------

Signed-off-by: noemotiovon <[email protected]>

b6297

27 Aug 09:06
1cf123a
Compare
Choose a tag to compare
ggml-cpu : add basic RVV support for vector f32 ops (#15057)

* ggml-cpu : add basic RVV support for vector f32 ops

* ggml-cpu : add RVV support for f32 softmax

b6295

27 Aug 07:03
86076f9
Compare
Choose a tag to compare
OpenCL: add fused group_norm/norm, mul, add (#15314)

* add fused group_norm/norm, mul, add

* fix spacing

* revert rms_norm logic

* fix trailing whitespace

b6293

26 Aug 19:21
8b69686
Compare
Choose a tag to compare
SYCL: fix rms_norm_mul_add for tensor dim not a multiple of sg_size (…

b6291

26 Aug 16:08
44b1efa
Compare
Choose a tag to compare
tests: add performance test for mul mat id (#15543)

b6290

26 Aug 16:05
a6a58d6
Compare
Choose a tag to compare
llamafile: PowerPC Sgemm Optimization (#15558)

This patch improves GEMM for FP32 Data Type on PowerPC

Implements GEMM on large blocks with configurable block size mc, nc, kc
(default: 256, 256, 256).
Packing Function optimized to access blocks as per memory layout.
GEMM Optimized to work on larger blocks.
Isolated Packing from GEMM Operations for better MMA utilization.

Verified functionality and correctness uing llama-cli and stand alone
test case (performs matmul and compares final mattrix C result with base).

Minor code refactoring changes:
Replace macro with inline function
Code Indent made consistent with 4 spaces

Performance Testing:

Observed 50% ~ 70% improvement in Prompt Processing Speed mesured using
llama-bench with Meta-Llama3-8B FP32 Model.  Similar gains observed with
Mistral-7b-Instruct-v0.3 Model.

model                   Size                Params     Backend       Threads   Test    Patch   Base
llama 8B all F32        29.92 GiB           8.03 B      CPU           20       pp512   98.58   60.3
llama 8B all F32        29.92 GiB           8.03 B      CPU           20       pp1024  95.88   57.36
llama 8B all F32        29.92 GiB           8.03 B      CPU           20       pp2048  85.46   53.26
llama 8B all F32        29.92 GiB           8.03 B      CPU           20       pp4096  68.66   45.78
llama 8B all F32        29.92 GiB           8.03 B      CPU           20       pp6144  57.35   40.44

25 ~ 30% improvement in llama-batched-bench with Metla-Llama3-8B in
Prompt Processing Speed for large prompts (256, 512, 1024, 2048, 4096)tokens with various batch
sizes ( 1, 2, 4, 8, 16)

Signed-off-by: Shalini Salomi Bodapati <[email protected]>

b6289

26 Aug 15:46
0373486
Compare
Choose a tag to compare
graph : fix assert in memory-less build_attn (#15590)

ggml-ci