Commit 5cb5675
[Inductor] optimize the heuristics of parallel reduction (pytorch#149614)
Fix pytorch#148639.
Summary:
Optimize the heuristics of parallel reduction: When the number of steps of the first inner loop beyond the maximum parallel depth is much larger than the number of steps of all outer loops within the maximum parallel depth, change the starting depth of parallelism to the first inner loop and recalculate the maximum parallel depth. I ran the Inductor benchmark with this PR on CPU. A timm model poolformer_m36 BF16 has about 25% performance improvement, and no performance regression is seen.
Pull Request resolved: pytorch#149614
Approved by: https://github.com/jgong5, https://github.com/leslie-fang-intel, https://github.com/jansel1 parent 0f12951 commit 5cb5675
1 file changed
+7
-6
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5457 | 5457 | | |
5458 | 5458 | | |
5459 | 5459 | | |
5460 | | - | |
| 5460 | + | |
5461 | 5461 | | |
5462 | 5462 | | |
5463 | 5463 | | |
5464 | | - | |
| 5464 | + | |
5465 | 5465 | | |
5466 | 5466 | | |
5467 | | - | |
5468 | | - | |
| 5467 | + | |
| 5468 | + | |
5469 | 5469 | | |
5470 | 5470 | | |
5471 | | - | |
| 5471 | + | |
5472 | 5472 | | |
5473 | | - | |
| 5473 | + | |
| 5474 | + | |
5474 | 5475 | | |
5475 | 5476 | | |
5476 | 5477 | | |
| |||
0 commit comments