Commit 33ac890
committed
[TRTLLM-11289][perf] Eliminate contiguous copies in CuTe DSL BF16 BMM path
Add wrapper_strided to PersistentDenseGemmKernel that accepts explicit A
tensor strides, enabling non-contiguous views (e.g. from .transpose()) to
be passed directly to TMA without .contiguous() copies. Update the BMM
runner to compute and pass A strides instead of forcing contiguous tensors,
removing the direct_copy_kernel_cuda overhead between attention and BMM.
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>1 parent 22261e9 commit 33ac890
File tree
2 files changed
+81
-24
lines changed- tensorrt_llm/_torch
- custom_ops
- cute_dsl_kernels/blackwell
2 files changed
+81
-24
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3847 | 3847 | | |
3848 | 3848 | | |
3849 | 3849 | | |
3850 | | - | |
3851 | | - | |
3852 | | - | |
3853 | | - | |
3854 | | - | |
3855 | | - | |
3856 | | - | |
3857 | | - | |
3858 | | - | |
3859 | | - | |
3860 | | - | |
3861 | | - | |
3862 | | - | |
3863 | | - | |
3864 | | - | |
3865 | | - | |
3866 | | - | |
3867 | | - | |
3868 | | - | |
| 3850 | + | |
| 3851 | + | |
| 3852 | + | |
| 3853 | + | |
| 3854 | + | |
3869 | 3855 | | |
3870 | 3856 | | |
3871 | 3857 | | |
3872 | 3858 | | |
3873 | 3859 | | |
3874 | 3860 | | |
| 3861 | + | |
| 3862 | + | |
| 3863 | + | |
| 3864 | + | |
| 3865 | + | |
| 3866 | + | |
| 3867 | + | |
| 3868 | + | |
| 3869 | + | |
| 3870 | + | |
3875 | 3871 | | |
3876 | 3872 | | |
3877 | 3873 | | |
| |||
3926 | 3922 | | |
3927 | 3923 | | |
3928 | 3924 | | |
3929 | | - | |
| 3925 | + | |
3930 | 3926 | | |
3931 | 3927 | | |
3932 | 3928 | | |
3933 | 3929 | | |
3934 | 3930 | | |
3935 | 3931 | | |
3936 | 3932 | | |
| 3933 | + | |
| 3934 | + | |
3937 | 3935 | | |
3938 | 3936 | | |
3939 | 3937 | | |
| |||
3953 | 3951 | | |
3954 | 3952 | | |
3955 | 3953 | | |
| 3954 | + | |
| 3955 | + | |
3956 | 3956 | | |
3957 | 3957 | | |
3958 | 3958 | | |
| |||
3963 | 3963 | | |
3964 | 3964 | | |
3965 | 3965 | | |
| 3966 | + | |
| 3967 | + | |
3966 | 3968 | | |
3967 | 3969 | | |
3968 | 3970 | | |
3969 | | - | |
3970 | | - | |
3971 | | - | |
3972 | | - | |
3973 | 3971 | | |
3974 | 3972 | | |
3975 | 3973 | | |
| |||
Lines changed: 59 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1010 | 1010 | | |
1011 | 1011 | | |
1012 | 1012 | | |
| 1013 | + | |
| 1014 | + | |
| 1015 | + | |
| 1016 | + | |
| 1017 | + | |
| 1018 | + | |
| 1019 | + | |
| 1020 | + | |
| 1021 | + | |
| 1022 | + | |
| 1023 | + | |
| 1024 | + | |
| 1025 | + | |
| 1026 | + | |
| 1027 | + | |
| 1028 | + | |
| 1029 | + | |
| 1030 | + | |
| 1031 | + | |
| 1032 | + | |
| 1033 | + | |
| 1034 | + | |
| 1035 | + | |
| 1036 | + | |
| 1037 | + | |
| 1038 | + | |
| 1039 | + | |
| 1040 | + | |
| 1041 | + | |
| 1042 | + | |
| 1043 | + | |
| 1044 | + | |
| 1045 | + | |
| 1046 | + | |
| 1047 | + | |
| 1048 | + | |
| 1049 | + | |
| 1050 | + | |
| 1051 | + | |
| 1052 | + | |
| 1053 | + | |
| 1054 | + | |
| 1055 | + | |
| 1056 | + | |
| 1057 | + | |
| 1058 | + | |
| 1059 | + | |
| 1060 | + | |
| 1061 | + | |
| 1062 | + | |
| 1063 | + | |
| 1064 | + | |
| 1065 | + | |
| 1066 | + | |
| 1067 | + | |
| 1068 | + | |
| 1069 | + | |
| 1070 | + | |
| 1071 | + | |
0 commit comments