attn kernel 读取 kv cache 时,prefill 用了 LinearIter,decode 用了 BlockIter,这种设计是出于什么考虑呢? #2518
Time-Limit
started this conversation in
General
Replies: 1 comment
-
Attention 用 BlockIter 在短 context 上比 LinearIter 上稍微好一点点,但长 context 要差不少 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
attn kernel 读取 kv cache 时,prefill 用了 LinearIter,decode 用了 BlockIter,这种设计是出于什么考虑呢?
Beta Was this translation helpful? Give feedback.
All reactions