Commit d072254
Extend vec backend with BF16 SVE intrinsics (pytorch#143666)
- Following the work in pytorch#119571, BF16 SVE intrinsics are added to the Vectorized class, providing ~1.7x speedup on `silu` and `softmax`.
- Added bf16 detection in CMake
- Added a guard for native NEON code to prevent compilation errors
@aditew01 @maajidkhann please have a look
Pull Request resolved: pytorch#143666
Approved by: https://github.com/swolchok, https://github.com/aditew01
Co-authored-by: Aditya Tewari <[email protected]>1 parent 68dfd44 commit d072254
File tree
15 files changed
+731
-43
lines changed- aten/src/ATen
- cpu/vec
- sve
- vec256
- native
- cpu
- test
- cmake
- Modules
- torch/_inductor
15 files changed
+731
-43
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
105 | 105 | | |
106 | 106 | | |
107 | 107 | | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
108 | 113 | | |
109 | 114 | | |
110 | 115 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
| 20 | + | |
20 | 21 | | |
21 | 22 | | |
22 | 23 | | |
| |||
41 | 42 | | |
42 | 43 | | |
43 | 44 | | |
| 45 | + | |
44 | 46 | | |
45 | 47 | | |
46 | 48 | | |
| |||
55 | 57 | | |
56 | 58 | | |
57 | 59 | | |
| 60 | + | |
| 61 | + | |
58 | 62 | | |
59 | 63 | | |
60 | 64 | | |
| |||
0 commit comments