-
Notifications
You must be signed in to change notification settings - Fork 42
Open
Description
在 PD 分离场景下,prefill worker 和 decode worker 的 TP size 要一样吗?
不一定,只要能够处理不同 TP size 之间的 kv layout 的转换,比如 Dynamo 的 block_copy.cu kernel 会做这个事情:
For decode and prefill with different KV layouts (i.e., due to different TP), Dynamo applies a high-performance kernel that transposes the KV blocks into their matching layout in the KV receiver after the NIXL reads and before the NIXL writes. https://github.com/ai-dynamo/dynamo/blob/main/docs/design_docs/disagg_serving.md
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels