Commit f49a148
[SWDEV-539076] Initial naive foreach autotune support (#2377)
Adds initial autotuning for foreach support required for
https://ontrack-internal.amd.com/browse/SWDEV-539076
4x improvement for some kernels
Before:
triton_for_fused_18.kd 🔍 | 4.986 ms | 4.986 ms | 2.493 ms | 2 |
triton_for_fused_6.kd 🔍 | 0.098 ms | 0.098 ms | 0.049 ms | 2 |
triton_for_fused_7.kd 🔍 | 0.036 ms | 0.036 ms | 0.018 ms | 2 |
After:
triton_for_fused_18.kd 🔍 | 1.273 ms | 1.273 ms | 0.636 ms | 2 |
triton_for_fused_6.kd 🔍 | 0.044 ms | 0.044 ms | 0.022 ms | 2 |
triton_for_fused_7.kd 🔍 | 0.024 ms | 0.024 ms | 0.012 ms | 2 |
(cherry picked from commit f07b7f7)
(cherry picked from commit ed0d0a7)1 parent 3a2af82 commit f49a148
File tree
2 files changed
+13
-4
lines changed- torch/_inductor
- codegen
- runtime
2 files changed
+13
-4
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
614 | 614 | | |
615 | 615 | | |
616 | 616 | | |
617 | | - | |
| 617 | + | |
618 | 618 | | |
619 | 619 | | |
620 | 620 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3102 | 3102 | | |
3103 | 3103 | | |
3104 | 3104 | | |
3105 | | - | |
| 3105 | + | |
3106 | 3106 | | |
3107 | 3107 | | |
3108 | 3108 | | |
| 3109 | + | |
| 3110 | + | |
| 3111 | + | |
| 3112 | + | |
| 3113 | + | |
| 3114 | + | |
| 3115 | + | |
| 3116 | + | |
| 3117 | + | |
| 3118 | + | |
3109 | 3119 | | |
3110 | 3120 | | |
3111 | | - | |
| 3121 | + | |
3112 | 3122 | | |
3113 | 3123 | | |
3114 | 3124 | | |
3115 | 3125 | | |
3116 | 3126 | | |
3117 | 3127 | | |
3118 | | - | |
3119 | 3128 | | |
3120 | 3129 | | |
3121 | 3130 | | |
| |||
0 commit comments