Commit ed09381
committed
[None][fix] Set token IDs on request after router tokenization to avoid re-tokenization
KvCacheAwareRouter now sets prompt_token_ids (ChatCompletionRequest) or
replaces prompt with token IDs (CompletionRequest) after tokenizing,
so the downstream worker server skips redundant tokenization.
Also adds proper ChatCompletionRequest handling via apply_chat_template.
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>1 parent 0467fcd commit ed09381
File tree
3 files changed
+591
-9
lines changed- scripts/attribution/data
- tensorrt_llm/serve
3 files changed
+591
-9
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
23 | 27 | | |
24 | 28 | | |
25 | 29 | | |
| |||
0 commit comments