Commit af2ce88
[SWDEV-539076] Initial naive foreach autotune support (#2377)
Adds initial autotuning for foreach support required for
https://ontrack-internal.amd.com/browse/SWDEV-539076
4x improvement for some kernels
Before:
triton_for_fused_18.kd 🔍 | 4.986 ms | 4.986 ms | 2.493 ms | 2 |
triton_for_fused_6.kd 🔍 | 0.098 ms | 0.098 ms | 0.049 ms | 2 |
triton_for_fused_7.kd 🔍 | 0.036 ms | 0.036 ms | 0.018 ms | 2 |
After:
triton_for_fused_18.kd 🔍 | 1.273 ms | 1.273 ms | 0.636 ms | 2 |
triton_for_fused_6.kd 🔍 | 0.044 ms | 0.044 ms | 0.022 ms | 2 |
triton_for_fused_7.kd 🔍 | 0.024 ms | 0.024 ms | 0.012 ms | 2 |
(cherry picked from commit f07b7f7)1 parent bbb1d6e commit af2ce88
File tree
2 files changed
+13
-4
lines changed- torch/_inductor
- codegen
- runtime
2 files changed
+13
-4
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
614 | 614 | | |
615 | 615 | | |
616 | 616 | | |
617 | | - | |
| 617 | + | |
618 | 618 | | |
619 | 619 | | |
620 | 620 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2779 | 2779 | | |
2780 | 2780 | | |
2781 | 2781 | | |
2782 | | - | |
| 2782 | + | |
2783 | 2783 | | |
2784 | 2784 | | |
2785 | 2785 | | |
| 2786 | + | |
| 2787 | + | |
| 2788 | + | |
| 2789 | + | |
| 2790 | + | |
| 2791 | + | |
| 2792 | + | |
| 2793 | + | |
| 2794 | + | |
| 2795 | + | |
2786 | 2796 | | |
2787 | 2797 | | |
2788 | | - | |
| 2798 | + | |
2789 | 2799 | | |
2790 | 2800 | | |
2791 | 2801 | | |
2792 | 2802 | | |
2793 | 2803 | | |
2794 | 2804 | | |
2795 | | - | |
2796 | 2805 | | |
2797 | 2806 | | |
2798 | 2807 | | |
| |||
0 commit comments