Skip to content

06-PD 分离问题反馈 #3

@cr7258

Description

@cr7258

在 PD 分离场景下,prefill worker 和 decode worker 的 TP size 要一样吗?

不一定,只要能够处理不同 TP size 之间的 kv layout 的转换,比如 Dynamo 的 block_copy.cu kernel 会做这个事情:

For decode and prefill with different KV layouts (i.e., due to different TP), Dynamo applies a high-performance kernel that transposes the KV blocks into their matching layout in the KV receiver after the NIXL reads and before the NIXL writes. https://github.com/ai-dynamo/dynamo/blob/main/docs/design_docs/disagg_serving.md

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions