Commit db3ba66
authored
Bug fix and optimisation for persistent reduction kernel tuning (#2596)
Original PR (#2417) had incorrect
indentation. Updated PR such that autotune will always add tiny configs,
otherwise use the hinted configs only.
Tested locally on test_torchinductor:
Ran 894 tests in 952.242s
FAILED (failures=1, skipped=28)
And completed autotune runs for microbench models
Microbenchmark for network : resnet152
Num devices: 1
Dtype: FP32
Mini batch size [img] : 64
Time per mini-batch : 0.09107530117034912
Throughput [img/sec] : 702.71521672262261 parent 675f868 commit db3ba66
1 file changed
+14
-14
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2595 | 2595 | | |
2596 | 2596 | | |
2597 | 2597 | | |
2598 | | - | |
2599 | | - | |
2600 | | - | |
2601 | | - | |
2602 | | - | |
2603 | | - | |
2604 | | - | |
2605 | | - | |
2606 | | - | |
2607 | | - | |
2608 | | - | |
2609 | | - | |
2610 | | - | |
2611 | | - | |
| 2598 | + | |
| 2599 | + | |
| 2600 | + | |
| 2601 | + | |
| 2602 | + | |
| 2603 | + | |
| 2604 | + | |
| 2605 | + | |
| 2606 | + | |
| 2607 | + | |
| 2608 | + | |
| 2609 | + | |
| 2610 | + | |
| 2611 | + | |
2612 | 2612 | | |
2613 | 2613 | | |
2614 | 2614 | | |
| |||
0 commit comments