Skip to content

Commit 5cb5675

Browse files
jiayisunxpytorchmergebot
authored andcommitted
[Inductor] optimize the heuristics of parallel reduction (pytorch#149614)
Fix pytorch#148639. Summary: Optimize the heuristics of parallel reduction: When the number of steps of the first inner loop beyond the maximum parallel depth is much larger than the number of steps of all outer loops within the maximum parallel depth, change the starting depth of parallelism to the first inner loop and recalculate the maximum parallel depth. I ran the Inductor benchmark with this PR on CPU. A timm model poolformer_m36 BF16 has about 25% performance improvement, and no performance regression is seen. Pull Request resolved: pytorch#149614 Approved by: https://github.com/jgong5, https://github.com/leslie-fang-intel, https://github.com/jansel
1 parent 0f12951 commit 5cb5675

File tree

1 file changed

+7
-6
lines changed

1 file changed

+7
-6
lines changed

torch/_inductor/codegen/cpp.py

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5457,20 +5457,21 @@ def max_parallel_depth(self):
54575457
start_depth = 0
54585458
max_depth = 0
54595459
is_reduction = self.loops[0].is_reduction
5460-
loop_sizes = sympy.Integer(1)
5460+
num_steps = sympy.Integer(1)
54615461
for loop in self.loops:
54625462
if loop.is_reduction != is_reduction:
54635463
break
5464-
loop_sizes = loop_sizes * loop.size
5464+
num_steps = num_steps * FloorDiv(loop.size, loop.steps)
54655465
max_depth += 1
54665466

5467-
# When the range of the first inner loop is much larger than the range of all outer loops,
5468-
# change `start_depth` to the first inner loop and recalculate `max_depth`.
5467+
# When the number of steps of the first inner loop is much larger than the number of steps of
5468+
# all outer loops, change `start_depth` to the first inner loop and recalculate `max_depth`.
54695469
if (
54705470
max_depth < len(self.loops)
5471-
and isinstance(loop_sizes, sympy.Integer)
5471+
and isinstance(num_steps, sympy.Integer)
54725472
and isinstance(self.loops[max_depth].size, sympy.Integer)
5473-
and loop_sizes * 300 < self.loops[max_depth].size
5473+
and num_steps * 300
5474+
< FloorDiv(self.loops[max_depth].size, self.loops[max_depth].steps)
54745475
):
54755476
start_depth = max_depth
54765477
max_depth = 0

0 commit comments

Comments
 (0)