Skip to content

Commit e049c47

Browse files
authored
Update deterministic blog (#204)
1 parent 30d0b9b commit e049c47

File tree

1 file changed

+3
-2
lines changed

1 file changed

+3
-2
lines changed

blog/2025-09-22-sglang-deterministic.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -141,7 +141,7 @@ The standard chunking strategy operates on a "best-effort" principle. In this ex
141141
Attention kernel is an important part of determinism. For different attention backends, we modified them in different ways to satisfy their usage requirements.
142142
- For Flashinfer backend, we utilize the `fixed_split_size` and `disable_kv_split` arguments from [batch invariant FA2 kernels](https://github.com/flashinfer-ai/flashinfer/pull/1675) to fix split sizes during kernel planning. Truncation of chunked prefill is aligned to the prefill split size. ([PR link](https://github.com/sgl-project/sglang/pull/10645))
143143
- For FlashAttention-3 backend, num-splits of flash attention kernel are fixed to 1 to ensure determinism. ([PR link](https://github.com/sgl-project/sglang/pull/10651))
144-
- For Triton backend, we fix the split size of decoding, and manually set the alignment size of chunked prefill. ([PR link](https://github.com/sgl-project/sglang/pull/10694))
144+
- For Triton backend, we fix the split size of decoding, and manually set the alignment size of chunked prefill. Deterministic inference can also run on **AMD** hardware with the extensibility of Triton backend. ([PR link](https://github.com/sgl-project/sglang/pull/10694)).
145145

146146

147147
### Reproducible Non-Greedy Sampling
@@ -173,5 +173,6 @@ Our future efforts will focus on enhancing deterministic inference by addressing
173173
We would like to extend our heartfelt gratitude to the following teams and collaborators:
174174
- **SGLang team and community**: Baizhou Zhang, Biao He, Qiaolin Yu, Xinyuan Tong, Ke Bao, Yineng Zhang, Chi Zhang, Ying Sheng, Lianmin Zheng and many others
175175
- **Flashinfer team and community**: Wenxuan Tan, Yilong Zhao, Zihao Ye
176-
- **slime team and community**: Yusheng Su, Zilin Zhu
176+
- **slime team and community**: Zilin Zhu
177+
- **AMD**: Yusheng Su
177178
- **Thinking Machines Lab**: for their awesome [blog](https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference/) and [batch_invariant_ops library](https://github.com/thinking-machines-lab/batch_invariant_ops)

0 commit comments

Comments
 (0)