add vectorization path on maxpool backward channel last #1907

chunhuanMeng · 2025-08-05T07:10:03Z

Part 2 of #1861
on PVC, 101,628 Scoreboard stalls decrease to 75,976. Significantly fewer instruction fetch and distance stalls, enabling higher effective bandwidth to HBM.

shape	device	before opt	after opt
[4096, 64, 27, 27]	pvc	27.10ms	12.70 ms
[4096, 192, 13, 13]	pvc	17.97ms	8.51 ms
[4096, 256, 6, 6]	pvc	5.10 ms	2.47 ms

Copilot

Pull Request Overview

This pull request adds a vectorization path for maxpool backward operations in channel-last memory layout to improve performance. The change introduces a new templated kernel implementation that processes multiple elements simultaneously using vector operations.

Refactors existing backward kernel to accumulate gradients locally before writing
Adds new vectorized kernel implementation for channel-last memory layout
Includes vectorization logic (currently commented out) with macro for launching vectorized kernels

src/ATen/native/xpu/sycl/DilatedMaxPool2d.cpp

Copilot · 2025-08-05T07:10:49Z

src/ATen/native/xpu/sycl/DilatedMaxPool2d.cpp

+  //     case 4:
+  //       LAUNCH_MAXPOOL_BACKWARD_CHANNEL_LAST_VEC(
+  //           scalar_t,
+  //           1,


The vec_size parameter should be 4, not 1, for the case 4 branch. This appears to be a copy-paste error that would prevent proper vectorization when vec_size is 4.

Suggested change

// 1,

// 4,

Copilot · 2025-08-05T07:10:50Z

src/ATen/native/xpu/sycl/DilatedMaxPool2d.cpp

+              grad_vec[i] = static_cast<scalar_t>(grad_vec[i]) +
+                  static_cast<scalar_t>(gout_val_vec[i]);


The cast static_cast<scalar_t>(grad_vec[i]) is redundant since grad_vec[i] is already of type scalar_t. This should be simplified to grad_vec[i] += static_cast<scalar_t>(gout_val_vec[i]);

Suggested change

grad_vec[i] = static_cast<scalar_t>(grad_vec[i]) +

static_cast<scalar_t>(gout_val_vec[i]);

grad_vec[i] += static_cast<scalar_t>(gout_val_vec[i]);

src/ATen/native/xpu/sycl/DilatedMaxPool2d.cpp

Co-authored-by: Copilot <[email protected]>

Update DilatedMaxPool2d.cpp

ac60deb

Copilot AI review requested due to automatic review settings August 5, 2025 07:10

Copilot AI reviewed Aug 5, 2025

View reviewed changes

chunhuanMeng and others added 4 commits August 5, 2025 15:11

Update DilatedMaxPool2d.cpp

76a5583

Update src/ATen/native/xpu/sycl/DilatedMaxPool2d.cpp

18eb934

Co-authored-by: Copilot <[email protected]>

remove unnecessary var

cefb88a

Co-authored-by: Copilot <[email protected]>

fix

25d2766

jianyizh added the kernel_optimization label Aug 6, 2025

chunhuanMeng added 2 commits August 10, 2025 22:32

fix

357beb4

Merge branch 'main' into meng_opt_max_pool_backward

ead7971

chunhuanMeng requested review from jianyizh and xytintel August 11, 2025 05:23

jianyizh requested a review from liangan1 August 13, 2025 02:39

chuanqi129 linked an issue Aug 13, 2025 that may be closed by this pull request

Maxpooling takes too long on BMG #1861

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add vectorization path on maxpool backward channel last #1907

add vectorization path on maxpool backward channel last #1907

Uh oh!

chunhuanMeng commented Aug 5, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Aug 5, 2025

Uh oh!

Copilot AI Aug 5, 2025

Uh oh!

Uh oh!

Uh oh!

		grad_vec[i] = static_cast<scalar_t>(grad_vec[i]) +
		static_cast<scalar_t>(gout_val_vec[i]);

	grad_vec[i] = static_cast<scalar_t>(grad_vec[i]) +
	static_cast<scalar_t>(gout_val_vec[i]);
	grad_vec[i] += static_cast<scalar_t>(gout_val_vec[i]);

add vectorization path on maxpool backward channel last #1907

Are you sure you want to change the base?

add vectorization path on maxpool backward channel last #1907

Uh oh!

Conversation

chunhuanMeng commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Copilot AI Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

chunhuanMeng commented Aug 5, 2025 •

edited

Loading