Skip to content

Commit 40be511

Browse files
authored
ggml-zdnn: fix ggml-org#15414, activate FP16 and BF16 acceleration and incorrect zTensor free (ggml-org#15839)
1 parent 4bf5549 commit 40be511

File tree

6 files changed

+7760
-3539
lines changed

6 files changed

+7760
-3539
lines changed

docs/build-s390x.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -241,8 +241,8 @@ IBM VXE/VXE2 SIMD acceleration depends on the BLAS implementation. It is strongl
241241
| | VX/VXE/VXE2 | zDNN | Spyre |
242242
|------------|-------------|------|-------|
243243
| FP32 ||||
244-
| FP16 || ||
245-
| BF16 | 🚫 | ||
244+
| FP16 || ||
245+
| BF16 | 🚫 | ||
246246
| Q4_0 ||||
247247
| Q4_1 ||||
248248
| MXFP4 | 🚫 |||
@@ -272,4 +272,4 @@ IBM VXE/VXE2 SIMD acceleration depends on the BLAS implementation. It is strongl
272272
- 🚫 - acceleration unavailable, will still run using scalar implementation
273273
- ❓ - acceleration unknown, please contribute if you can test it yourself
274274

275-
Last Updated by **Aaron Teo ([email protected])** on Sep 6, 2025.
275+
Last Updated by **Aaron Teo ([email protected])** on Sep 7, 2025.

docs/ops.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ Legend:
1818
| ACC ||||||||||
1919
| ADD ||||| 🟡 | 🟡 ||||
2020
| ADD1 ||||||||||
21+
| ADD_ID ||||||||||
2122
| ARANGE ||||||||||
2223
| ARGMAX ||||||||||
2324
| ARGSORT ||||||||||
@@ -26,6 +27,7 @@ Legend:
2627
| CONT || 🟡 |||| 🟡 | 🟡 | 🟡 ||
2728
| CONV_2D ||||||||||
2829
| CONV_2D_DW ||||||||||
30+
| CONV_3D ||||||||||
2931
| CONV_TRANSPOSE_1D ||||||||||
3032
| CONV_TRANSPOSE_2D ||||||||||
3133
| COS ||||| 🟡 ||| 🟡 ||
@@ -49,9 +51,11 @@ Legend:
4951
| GET_ROWS || 🟡 || 🟡 || 🟡 | 🟡 | 🟡 ||
5052
| GET_ROWS_BACK ||| 🟡 | 🟡 ||||||
5153
| GROUP_NORM ||||||||||
54+
| GROUP_NORM_MUL_ADD ||||||||||
5255
| HARDSIGMOID |||| 🟡 | 🟡 || 🟡 |||
5356
| HARDSWISH |||| 🟡 | 🟡 || 🟡 |||
5457
| IM2COL ||||| 🟡 |||||
58+
| IM2COL_3D ||||||||||
5559
| L2_NORM ||||||||||
5660
| LEAKY_RELU ||||||||||
5761
| LOG ||||||||||
@@ -61,7 +65,9 @@ Legend:
6165
| MUL_MAT_ID || 🟡 |||| 🟡 | 🟡 |||
6266
| NEG |||| 🟡 | 🟡 || 🟡 |||
6367
| NORM ||||| 🟡 ||| 🟡 ||
68+
| NORM_MUL_ADD ||||||||||
6469
| OPT_STEP_ADAMW ||||||||||
70+
| OPT_STEP_SGD ||||||||||
6571
| OUT_PROD | 🟡 || 🟡 | 🟡 ||| 🟡 |||
6672
| PAD ||||||||||
6773
| PAD_REFLECT_1D ||||||||||
@@ -98,6 +104,7 @@ Legend:
98104
| SUM ||||||||||
99105
| SUM_ROWS ||||||||||
100106
| SWIGLU ||||| 🟡 ||| 🟡 ||
107+
| SWIGLU_OAI ||||||||||
101108
| TANH |||| 🟡 | 🟡 || 🟡 | 🟡 ||
102109
| TIMESTEP_EMBEDDING ||||||||||
103110
| UPSCALE || 🟡 ||| 🟡 || 🟡 |||

0 commit comments

Comments
 (0)