Commit 33b52ef
Static attention: support local-global attention (pytorch#13043)
Summary:
Runtime: support different cache lengths for different layer.
Python: add sliding window cache update which was already in the runtime.
Reviewed By: billmguo
Differential Revision: D792676441 parent cbbbc3a commit 33b52ef
File tree
3 files changed
+272
-139
lines changed- examples/models/llama
- runner
- tests
3 files changed
+272
-139
lines changed
0 commit comments