Skip to content

Conversation

@jan-service-account
Copy link

Updates dev branch with latest release (b6123) from ggml-org/llama.cpp

am17an and others added 2 commits August 9, 2025 20:00
* CUDA: add attention sinks for tile and wmma

* Review: formatting changes + remove syncthreads from tile + remove warp_reduce_max from wmma
* cuda: refactored ssm_scan to use CUB

* fixed compilation error when when not using CUB

* assign L to constant and use size_t instead of int

* deduplicated functions

* change min blocks per mp to 1

* Use cub load and store warp transpose

* suppress clang warning
@jan-service-account jan-service-account merged commit fb8fbcd into dev Aug 10, 2025
9 checks passed
@jan-service-account jan-service-account deleted the update-dev-from-master-2025-08-10-00-14 branch August 10, 2025 00:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants