Skip to content

Conversation

@jan-service-account
Copy link

Updates dev branch with latest release (b5043) from ggml-org/llama.cpp

lhez and others added 10 commits April 2, 2025 17:01
* CANN: Fix memory waste in aclnn_tensor

* CANN: fix backend ops fail

* CANN: fix acl_tensor memory alloc.

* CANN: format

* CANN: remove trailing whitespace
ggml-org#9017)

* CUDA: Simplify and improve CUDA graphs through use of indirect copy pointers

Previously there was complexity in the CUDA graphs implementation due
frequently changing parameters to copy kernels associated with K and V
cache pointers. This patch simplifies by using indirection to avoid
such parameters frequently changing, avoiding the need for frequent
graph updates.

Fixes ggml-org#12152

* Addressed comments

* fix HIP builds

* properly sync to stream

* removed ggml_cuda_cpy_fn_ptrs

* move stream sync before free

* guard to only use indirection with graphs

* style fixes

* check for errors

---------

Co-authored-by: slaren <[email protected]>
* [CANN]support sin cos argmax

Signed-off-by: noemotiovon <[email protected]>

* [CANN]codestyle adjustment

Signed-off-by: noemotiovon <[email protected]>

* [CANN]Remove redundant code

Signed-off-by: noemotiovon <[email protected]>

---------

Signed-off-by: noemotiovon <[email protected]>
Co-authored-by: noemotiovon <[email protected]>
* fix MUSA compiler warning

* replace (void) with GGML_UNUSED
…12738)

* Prefer vector flash decoding kernel for Gemma models

Vector flash decoding kernel was not being picked for models with head dimension 256. Gemma models are in this category.
Removing this limit improves e2e performance by upto 12% in gen phase throughput for Gemm models.

* Update ggml/src/ggml-cuda/fattn.cu

Co-authored-by: Johannes Gäßler <[email protected]>

---------

Co-authored-by: Johannes Gäßler <[email protected]>
@jan-service-account jan-service-account merged commit 98f7290 into dev Apr 4, 2025
10 checks passed
@jan-service-account jan-service-account deleted the update-dev-from-master-2025-04-04-00-08 branch April 4, 2025 00:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.