Skip to content

Commit ed09381

Browse files
committed
[None][fix] Set token IDs on request after router tokenization to avoid re-tokenization
KvCacheAwareRouter now sets prompt_token_ids (ChatCompletionRequest) or replaces prompt with token IDs (CompletionRequest) after tokenizing, so the downstream worker server skips redundant tokenization. Also adds proper ChatCompletionRequest handling via apply_chat_template. Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
1 parent 0467fcd commit ed09381

File tree

3 files changed

+591
-9
lines changed

3 files changed

+591
-9
lines changed

scripts/attribution/data/dependency_metadata.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,10 @@ cutlass/v4.3.0:
2020
copyright: 501a1dd5925b22dbe66ce8f195440dec
2121
license: 0d49472f2c1269d573d79e5603778a9f
2222
source: https://github.com/NVIDIA/cutlass/tree/v4.3.0
23+
cutlass/v4.4.1:
24+
copyright: aa8578535b24304362e63f0e170d8e60
25+
license: 1078b45037d7c56c5c2af36977a4d08e
26+
source: https://github.com/NVIDIA/cutlass/tree/v4.4.1
2327
deep_ep/5be51b228a7c82dbdb213ea58e77bffd12b38af8:
2428
license: 452b3ef002dc6ec283fb723f0dd84997
2529
source: https://github.com/deepseek-ai/DeepEP/tree/5be51b228a7c82dbdb213ea58e77bffd12b38af8

0 commit comments

Comments
 (0)