Commit 1510e3b
Static attention: support local-global attention (pytorch#13043)
Summary:
Runtime: support different cache lengths for different layer.
Python: add sliding window cache update which was already in the runtime.
Reviewed By: billmguo
Differential Revision: D792676441 parent cf2f170 commit 1510e3b
File tree
3 files changed
+264
-136
lines changed- examples/models/llama
- runner
- tests
3 files changed
+264
-136
lines changed
0 commit comments