Commit 57dd4dd

authored and

committed

Static attention: support local-global attention (pytorch#13043)

Summary: Pull Request resolved: pytorch#13043 Runtime: support different cache lengths for different layer. Python: add sliding window cache update which was already in the runtime. Reviewed By: billmguo Differential Revision: D79267644

1 parent 3e70463 commit 57dd4ddCopy full SHA for 57dd4dd

3 files changed

+273

-139

lines changed

examples/models/llama
- runner
  - static_attention_io_manager.h
- static_attention.py
- tests
  - test_static_attention.py

3 files changed

+273

-139

lines changed

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit 57dd4dd

3 files changed

3 files changed

File tree

3 files changed

3 files changed

0 commit comments