Commit e2d141d
set thread_work_size to 4 for unrolled kernel (pytorch#154541)
set thread_work_size to 4 for unrolled kernel (pytorch#152396)
Previous PRs enabling 8-vectorization inadvertently regressed unrolled kernel perf.
Pull Request resolved: pytorch#152396
Approved by: https://github.com/BoyuanFeng, https://github.com/msaroufim, https://github.com/malfet, https://github.com/Aidyn-A, https://github.com/atalman
(cherry picked from commit adebb8b)
Co-authored-by: Natalia Gimelshein <[email protected]>1 parent 1214198 commit e2d141d
1 file changed
+11
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
83 | 83 | | |
84 | 84 | | |
85 | 85 | | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
86 | 94 | | |
87 | 95 | | |
88 | 96 | | |
| |||
336 | 344 | | |
337 | 345 | | |
338 | 346 | | |
339 | | - | |
| 347 | + | |
| 348 | + | |
340 | 349 | | |
341 | | - | |
| 350 | + | |
342 | 351 | | |
343 | 352 | | |
344 | 353 | | |
| |||
0 commit comments