Skip to content

Conversation

@Todobe
Copy link
Contributor

@Todobe Todobe commented Dec 19, 2025

No description provided.

@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@Todobe Todobe changed the title Optimize sinks attention Optimize sinks attention for prefix cache Dec 19, 2025
@RuixuanZhang06 RuixuanZhang06 merged commit 5402f33 into sgl-project:main Jan 12, 2026
4 checks passed
zhuyutong332 added a commit to zhuyutong332/sgl-kernel-npu that referenced this pull request Jan 14, 2026
* upstream/main:
  fix little batchsize and int8 quant on ci (sgl-project#302)
  optimize sinks attention (sgl-project#260)
  add swiglu_oai_triton (sgl-project#270)
  update tag to 2026.01.12 (sgl-project#312)
  feat:add performance compare (sgl-project#311)
  support add_gemma_rms_norm (sgl-project#310)
  optimize gdn gating and fused_qkvzba_split_reshape_cat (sgl-project#306)
  fix layout numTokensPerExpertTensor partial Initialization bug (sgl-project#303)
  Supplement A2 doc, software and hardware compatibility info (sgl-project#294)
  Added an environment variable to control whether to enable the Combine Ant Migration feature. (sgl-project#304)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants