Commit 2c8b44c
[None][refactor] Move _update_k_cache into sparse_attn_indexer
Move _update_k_cache call to the top of sparse_attn_indexer so
the k cache is populated right before prefill chunks gather from it.
Remove pre_indexer (now redundant); forward() and forward_dsa_proj
both call pre_indexer_proj directly.
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>1 parent 165d4b9 commit 2c8b44c
File tree
2 files changed
+5
-23
lines changed- tensorrt_llm/_torch
- attention_backend/sparse
- modules
2 files changed
+5
-23
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1395 | 1395 | | |
1396 | 1396 | | |
1397 | 1397 | | |
| 1398 | + | |
| 1399 | + | |
| 1400 | + | |
1398 | 1401 | | |
1399 | 1402 | | |
1400 | 1403 | | |
| |||
1669 | 1672 | | |
1670 | 1673 | | |
1671 | 1674 | | |
1672 | | - | |
1673 | | - | |
1674 | | - | |
1675 | | - | |
1676 | | - | |
1677 | | - | |
1678 | | - | |
1679 | | - | |
1680 | | - | |
1681 | | - | |
1682 | | - | |
1683 | | - | |
1684 | | - | |
1685 | | - | |
1686 | | - | |
1687 | | - | |
1688 | | - | |
1689 | | - | |
1690 | 1675 | | |
1691 | 1676 | | |
1692 | 1677 | | |
| |||
1733 | 1718 | | |
1734 | 1719 | | |
1735 | 1720 | | |
1736 | | - | |
1737 | | - | |
| 1721 | + | |
| 1722 | + | |
1738 | 1723 | | |
1739 | 1724 | | |
1740 | 1725 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1786 | 1786 | | |
1787 | 1787 | | |
1788 | 1788 | | |
1789 | | - | |
1790 | | - | |
1791 | | - | |
1792 | 1789 | | |
1793 | 1790 | | |
1794 | 1791 | | |
| |||
0 commit comments