Commit 97bc680
feat: support kv cache reuse for MLA (#3571)
* support kv cache reuse for MLA
load compressed_kv and k_pe and do up-projection
use 192/128 head size MLA context kernel
support Blackwell and Hopper now
Signed-off-by: Zhen Huang <[email protected]>
* add CI test
Signed-off-by: Zhen Huang <[email protected]>
* fix: set k_pe head_num to 1 for kernel 2 and kernel 2V2
Signed-off-by: Mingyang Jiang <[email protected]>
* resolve comments
Signed-off-by: Zhen Huang <[email protected]>
* use GPTJ style RoPE for MLA
Signed-off-by: Zhen Huang <[email protected]>
* fix rebase error and some docs
Signed-off-by: Zhen Huang <[email protected]>
* fix kv_lens
Signed-off-by: Zhen Huang <[email protected]>
* tiny fix
Signed-off-by: Zhen Huang <[email protected]>
* fix torch compile
Signed-off-by: Zhen Huang <[email protected]>
* fix: use normal device memory instead of pinned memory for unit test
Signed-off-by: Mingyang Jiang <[email protected]>
* fix L0 tests
Signed-off-by: Zhen Huang <[email protected]>
* fix torch compile after rebase
Signed-off-by: Zhen Huang <[email protected]>
* resolve comments
Signed-off-by: Zhen Huang <[email protected]>
* resolve comments again
Signed-off-by: Zhen Huang <[email protected]>
---------
Signed-off-by: Zhen Huang <[email protected]>
Signed-off-by: Mingyang Jiang <[email protected]>
Signed-off-by: zhhuang-nv <[email protected]>
Co-authored-by: Mingyang Jiang <[email protected]>1 parent b4e5df0 commit 97bc680
File tree
32 files changed
+14638
-9067
lines changed- cpp
- include/tensorrt_llm/batch_manager
- tensorrt_llm
- batch_manager
- common
- kernels
- contextFusedMultiHeadAttention
- cubin
- thop
- tests/unit_tests
- batch_manager
- kernels
- examples/models/core/deepseek_v3
- tensorrt_llm
- _torch
- attention_backend
- custom_ops
- modules
- pyexecutor
- tests/integration
- defs/accuracy
- test_lists/test-db
32 files changed
+14638
-9067
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
456 | 456 | | |
457 | 457 | | |
458 | 458 | | |
| 459 | + | |
459 | 460 | | |
460 | 461 | | |
461 | 462 | | |
| |||
469 | 470 | | |
470 | 471 | | |
471 | 472 | | |
472 | | - | |
473 | | - | |
| 473 | + | |
| 474 | + | |
474 | 475 | | |
475 | 476 | | |
| 477 | + | |
476 | 478 | | |
477 | 479 | | |
478 | 480 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
557 | 557 | | |
558 | 558 | | |
559 | 559 | | |
560 | | - | |
| 560 | + | |
561 | 561 | | |
562 | 562 | | |
563 | 563 | | |
| |||
649 | 649 | | |
650 | 650 | | |
651 | 651 | | |
652 | | - | |
653 | | - | |
| 652 | + | |
| 653 | + | |
654 | 654 | | |
655 | 655 | | |
656 | 656 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
78 | 78 | | |
79 | 79 | | |
80 | 80 | | |
| 81 | + | |
81 | 82 | | |
82 | 83 | | |
83 | 84 | | |
84 | 85 | | |
85 | 86 | | |
86 | 87 | | |
87 | | - | |
| 88 | + | |
88 | 89 | | |
89 | 90 | | |
90 | 91 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
| 20 | + | |
20 | 21 | | |
21 | 22 | | |
22 | 23 | | |
| |||
1528 | 1529 | | |
1529 | 1530 | | |
1530 | 1531 | | |
| 1532 | + | |
| 1533 | + | |
1531 | 1534 | | |
1532 | 1535 | | |
1533 | 1536 | | |
1534 | 1537 | | |
1535 | 1538 | | |
1536 | | - | |
| 1539 | + | |
| 1540 | + | |
| 1541 | + | |
| 1542 | + | |
| 1543 | + | |
| 1544 | + | |
| 1545 | + | |
| 1546 | + | |
| 1547 | + | |
| 1548 | + | |
| 1549 | + | |
| 1550 | + | |
| 1551 | + | |
| 1552 | + | |
| 1553 | + | |
| 1554 | + | |
| 1555 | + | |
| 1556 | + | |
| 1557 | + | |
| 1558 | + | |
| 1559 | + | |
| 1560 | + | |
| 1561 | + | |
| 1562 | + | |
| 1563 | + | |
| 1564 | + | |
| 1565 | + | |
| 1566 | + | |
| 1567 | + | |
| 1568 | + | |
| 1569 | + | |
1537 | 1570 | | |
1538 | 1571 | | |
1539 | 1572 | | |
| |||
1596 | 1629 | | |
1597 | 1630 | | |
1598 | 1631 | | |
1599 | | - | |
| 1632 | + | |
| 1633 | + | |
| 1634 | + | |
| 1635 | + | |
| 1636 | + | |
| 1637 | + | |
| 1638 | + | |
| 1639 | + | |
| 1640 | + | |
1600 | 1641 | | |
1601 | 1642 | | |
1602 | 1643 | | |
| |||
1612 | 1653 | | |
1613 | 1654 | | |
1614 | 1655 | | |
1615 | | - | |
1616 | 1656 | | |
1617 | 1657 | | |
1618 | 1658 | | |
| |||
2418 | 2458 | | |
2419 | 2459 | | |
2420 | 2460 | | |
2421 | | - | |
2422 | | - | |
2423 | | - | |
| 2461 | + | |
| 2462 | + | |
2424 | 2463 | | |
2425 | 2464 | | |
2426 | 2465 | | |
| |||
Lines changed: 6 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
379 | 379 | | |
380 | 380 | | |
381 | 381 | | |
382 | | - | |
383 | | - | |
| 382 | + | |
384 | 383 | | |
| 384 | + | |
385 | 385 | | |
386 | 386 | | |
387 | 387 | | |
| |||
1661 | 1661 | | |
1662 | 1662 | | |
1663 | 1663 | | |
1664 | | - | |
1665 | | - | |
| 1664 | + | |
1666 | 1665 | | |
| 1666 | + | |
1667 | 1667 | | |
1668 | 1668 | | |
1669 | 1669 | | |
| |||
3573 | 3573 | | |
3574 | 3574 | | |
3575 | 3575 | | |
3576 | | - | |
3577 | | - | |
| 3576 | + | |
3578 | 3577 | | |
| 3578 | + | |
3579 | 3579 | | |
3580 | 3580 | | |
3581 | 3581 | | |
| |||
0 commit comments