why launch 3 kernels for prefill stage? #2501
Answered
by
lzhangzz
sleepwalker2017
asked this question in
Q&A
Replies: 1 comment 1 reply
-
|
Beta Was this translation helpful? Give feedback.
1 reply
Answer selected by
sleepwalker2017
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
从代码看,
invokeProcessKV_v2_
这个 kernel 做了 Rope 和量化相关的事情。而
invokeFlattenKV_v2_
似乎仅仅是 Rope 相关的。dispatchAttention
是 Attention 相关的操作。疑惑:
谢谢!
补充:
第一个 kernel: 从连续内存加载 -> 预处理 -> 写入 page
第二个 kernel: 从 page 加载 -> 写入连续内存?
Beta Was this translation helpful? Give feedback.
All reactions