You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit was created on GitHub.com and signed with GitHub’s verified signature.
CUDA: Add mul_mat_id support for the mmf kernel (#15767)
* CUDA: Add mul_mat_id support the mmf
Add support for mul_mat_id for bs < 16
* Review: use warp_size, fix should_use_mmf condition
* Launch one block per expert, stride along n_expert_used
* templatize mul_mat_id
* Pad shmem to 16 bytes, add helper function mul_mat_f_switch_ids
* Reduce compile times by dividing mmf into f16, bf16 and f32 variants
* Divide mmf by ncols_dst
* Add missing files
* Fix MUSA/HIP builds