Skip to content

Commit 400c14a

Browse files
authored
Apply suggestion from @simon-mo
1 parent ffe2b1e commit 400c14a

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

_posts/2025-09-29-deepseek-v3-2.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,7 @@ We need to mark the start context and the end context for each query token. We u
8686

8787
In this case, `ks` will be `[0] * q1 + [q1] * q2 + ... + [q1 + q2 + ... + qb] * qb`. Here `*` means repeating the list. `ke` will be `list(range(n1 - q1, n1, 1)) + list(range(n2 - q2, n2, 1)) + ... + list(range(nb - qb, nb, 1))` plus the offset of `ks`.
8888

89-
After computing the logits, we need to perform the `topk` operation. However, a clear challenge is that at high batch size with long context, the logits tensor is materialized before running a row-wise `topk`. The vLLM team is working on a fused version inspired by FlashAttention, so we can run an online topk in a way that we don't need to materialize the intermediate logits.
89+
After computing the logits, we need to perform the `topk` operation. However, a clear challenge is that at high batch size with long context, the logits tensor is materialized before running a row-wise `topk`.
9090

9191
#### Fusion pass, more kernels, and Blackwell Support
9292

0 commit comments

Comments
 (0)