Commit 1d61b14
committed
This is a combination of 16 commits.
Implement CUDA Graph compatible multi LoRAs
Signed-off-by: Jiayu Chang <jiayuc@nvidia.com>
Refactor CUDA Graph LoRA integration to support precomputed leading dimensions
- Updated `cuda_graph_grouped_gemm` and `cuda_graph_splitk_grouped_gemm` functions to accept leading dimension pointers for A, B, C, and D matrices.
- Modified `LoraImpl` to retrieve and pass leading dimension pointers during GEMM operations.
- Enhanced `CudaGraphLoraParams` to manage leading dimensions for each layer and module.
- Adjusted `CudaGraphLoraManager` to initialize parameters based on actual layer configurations from the PEFT table.
- Improved handling of layer-specific parameters to ensure compatibility with CUDA Graph operations.
This refactor aims to optimize performance by leveraging precomputed leading dimensions, reducing overhead during GEMM execution.
Signed-off-by: Jiayu Chang <jiayuc@nvidia.com>
bug fixes
Signed-off-by: Jiayu Chang <jiayuc@nvidia.com>
Move input prep to graph
Signed-off-by: Jiayu Chang <jiayuc@nvidia.com>
Fix bug in adapter size
Signed-off-by: Jiayu Chang <jiayuc@nvidia.com>
Pass all but `test_llama_7b_lora_config_overrides_peft_cache_config` on L40s
Graph seems to capture code outside of the captured function?
Signed-off-by: Jiayu Chang <jiayuc@nvidia.com>
Pass all tests
Signed-off-by: Jiayu Chang <jiayuc@nvidia.com>
sync slot manager with c++
Signed-off-by: Jiayu Chang <jiayuc@nvidia.com>
Update kernel alignment selection
Signed-off-by: Jiayu Chang <jiayuc@nvidia.com>
Fix kernel workspace sizes
Signed-off-by: Jiayu Chang <jiayuc@nvidia.com>
memcpy use pinned memory; remove assert in slot manager eviction
Signed-off-by: Jiayu Chang <jiayuc@nvidia.com>
Add param fill fused kernel
Signed-off-by: Jiayu Chang <jiayuc@nvidia.com>
Disable torch nvtx emit
Signed-off-by: Jiayu Chang <jiayuc@nvidia.com>
Disable init manager without cuda graph
Signed-off-by: Jiayu Chang <jiayuc@nvidia.com>
Update CI
Signed-off-by: Jiayu Chang <jiayuc@nvidia.com>
Moved files
Signed-off-by: Jiayu Chang <jiayuc@nvidia.com>1 parent 2695d70 commit 1d61b14
File tree
24 files changed
+3343
-137
lines changed- cpp
- include/tensorrt_llm/batch_manager
- tensorrt_llm
- batch_manager
- kernels
- lora
- nanobind/batch_manager
- pybind/batch_manager
- thop
- tensorrt_llm
- _torch
- modules
- peft/lora
- pyexecutor
- executor
- tests/unittest/llmapi
24 files changed
+3343
-137
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
115 | 115 | | |
116 | 116 | | |
117 | 117 | | |
| 118 | + | |
| 119 | + | |
118 | 120 | | |
119 | 121 | | |
120 | 122 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
462 | 462 | | |
463 | 463 | | |
464 | 464 | | |
465 | | - | |
466 | | - | |
467 | | - | |
468 | | - | |
| 465 | + | |
| 466 | + | |
469 | 467 | | |
470 | 468 | | |
471 | 469 | | |
| |||
486 | 484 | | |
487 | 485 | | |
488 | 486 | | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
489 | 492 | | |
490 | 493 | | |
491 | 494 | | |
| |||
645 | 648 | | |
646 | 649 | | |
647 | 650 | | |
| 651 | + | |
| 652 | + | |
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
296 | 296 | | |
297 | 297 | | |
298 | 298 | | |
299 | | - | |
| 299 | + | |
300 | 300 | | |
301 | 301 | | |
302 | 302 | | |
| |||
0 commit comments