Apply suggestion from @simon-mo

simon-mo · web-flow · commit 400c14ac8368 · 2025-09-29T05:43:01.000-07:00
diff --git a/_posts/2025-09-29-deepseek-v3-2.md b/_posts/2025-09-29-deepseek-v3-2.md
@@ -86,7 +86,7 @@ We need to mark the start context and the end context for each query token. We u
 
 In this case, `ks` will be `[0] * q1 + [q1] * q2 + ... + [q1 + q2 + ... + qb] * qb`. Here `*` means repeating the list. `ke` will be `list(range(n1 - q1, n1, 1)) + list(range(n2 - q2, n2, 1)) + ... + list(range(nb - qb, nb, 1))` plus the offset of `ks`.
 
-After computing the logits, we need to perform the `topk` operation. However, a clear challenge is that at high batch size with long context, the logits tensor is materialized before running a row-wise `topk`. The vLLM team is working on a fused version inspired by FlashAttention, so we can run an online topk in a way that we don't need to materialize the intermediate logits. 
+After computing the logits, we need to perform the `topk` operation. However, a clear challenge is that at high batch size with long context, the logits tensor is materialized before running a row-wise `topk`. 
 
 #### Fusion pass, more kernels, and Blackwell Support