Commit 768361e
Add less warps config to inner reductions (pytorch#162447)
Add less warps to ensure proper vectorization + memory coalescing for inner reductions, prefer more work per thread
<img width="1717" height="731" alt="Screenshot 2025-09-17 at 10 03 25 AM" src="https://github.com/user-attachments/assets/7b1f4a30-62f2-4bee-bb9c-122501bde63e" />
Pull Request resolved: pytorch#162447
Approved by: https://github.com/v0i0, https://github.com/eellison, https://github.com/shunting3141 parent 9341ede commit 768361e
1 file changed
+18
-3
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2333 | 2333 | | |
2334 | 2334 | | |
2335 | 2335 | | |
| 2336 | + | |
2336 | 2337 | | |
2337 | 2338 | | |
2338 | 2339 | | |
| |||
2360 | 2361 | | |
2361 | 2362 | | |
2362 | 2363 | | |
2363 | | - | |
| 2364 | + | |
| 2365 | + | |
| 2366 | + | |
| 2367 | + | |
| 2368 | + | |
| 2369 | + | |
2364 | 2370 | | |
2365 | 2371 | | |
2366 | 2372 | | |
| |||
2630 | 2636 | | |
2631 | 2637 | | |
2632 | 2638 | | |
| 2639 | + | |
2633 | 2640 | | |
2634 | 2641 | | |
2635 | 2642 | | |
| |||
2681 | 2688 | | |
2682 | 2689 | | |
2683 | 2690 | | |
2684 | | - | |
| 2691 | + | |
2685 | 2692 | | |
2686 | 2693 | | |
2687 | 2694 | | |
| |||
2911 | 2918 | | |
2912 | 2919 | | |
2913 | 2920 | | |
2914 | | - | |
| 2921 | + | |
| 2922 | + | |
| 2923 | + | |
| 2924 | + | |
| 2925 | + | |
| 2926 | + | |
| 2927 | + | |
2915 | 2928 | | |
2916 | 2929 | | |
2917 | 2930 | | |
| |||
2954 | 2967 | | |
2955 | 2968 | | |
2956 | 2969 | | |
| 2970 | + | |
2957 | 2971 | | |
2958 | 2972 | | |
2959 | 2973 | | |
| |||
2965 | 2979 | | |
2966 | 2980 | | |
2967 | 2981 | | |
| 2982 | + | |
2968 | 2983 | | |
2969 | 2984 | | |
2970 | 2985 | | |
| |||
0 commit comments