Commit 57dd4dd
Static attention: support local-global attention (pytorch#13043)
Summary:
Pull Request resolved: pytorch#13043
Runtime: support different cache lengths for different layer.
Python: add sliding window cache update which was already in the runtime.
Reviewed By: billmguo
Differential Revision: D792676441 parent 3e70463 commit 57dd4dd
File tree
3 files changed
+273
-139
lines changed- examples/models/llama
- runner
- tests
3 files changed
+273
-139
lines changed
0 commit comments