Commit e363d4d
committed
Correct attention handling in ModelConfig and KVCacheManager
Fix the model config binding so KVCacheManager can compute required cache blocks with accurate head/hidden sizes.
Changes:
- Updated hidden size and key-value head calculations with the correct attention TP and CP sizes.
- Added enable_attention_dp parameter to KVCacheManager for improved resource management.
Signed-off-by: Jaedeok Kim <jaedeokk@nvidia.com>1 parent 6732c76 commit e363d4d
File tree
2 files changed
+11
-6
lines changed- tensorrt_llm/_torch
- pyexecutor
2 files changed
+11
-6
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
495 | 495 | | |
496 | 496 | | |
497 | 497 | | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
498 | 503 | | |
499 | | - | |
| 504 | + | |
500 | 505 | | |
501 | | - | |
| 506 | + | |
502 | 507 | | |
503 | 508 | | |
504 | 509 | | |
| |||
523 | 528 | | |
524 | 529 | | |
525 | 530 | | |
526 | | - | |
| 531 | + | |
527 | 532 | | |
528 | 533 | | |
529 | 534 | | |
530 | 535 | | |
531 | | - | |
532 | | - | |
| 536 | + | |
533 | 537 | | |
534 | 538 | | |
535 | 539 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1133 | 1133 | | |
1134 | 1134 | | |
1135 | 1135 | | |
1136 | | - | |
| 1136 | + | |
| 1137 | + | |
1137 | 1138 | | |
1138 | 1139 | | |
1139 | 1140 | | |
| |||
0 commit comments