Skip to content

Conversation

@k50112113
Copy link

@k50112113 k50112113 commented Nov 4, 2025

this PR includes optimization for DS-R1 FP8 TP8:

  1. add a16w8 gemm for o_proj for decode
  2. add rocm_aiter_triton_qkv_a_proj_layernorm

this PR depends on ROCm/aiter#1328

@k50112113 k50112113 changed the title [355_wip] add a16w8 gemm for DS-R1 for o_proj for decode, add rocm_aiter_triton… [Triton] add a16w8 gemm for DS-R1 for o_proj for decode, add rocm_aiter_triton… Nov 4, 2025
@k50112113 k50112113 force-pushed the shaoclee/355_wip_ds_a16w8_reduce_rms_rope_rebase branch from 4b1dcc7 to ae9c305 Compare November 18, 2025 14:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants