You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ggml-zendnn : add MUL_MAT_ID op support for MoE models (#21315)
* ggml-zendnn : add MUL_MAT_ID op support for MoE models
- Add MUL_MAT_ID op acceleration for Mixture-of-Experts models
- MUL_MAT_ID op fallback to CPU backend if total experts > 32
- Point ZenDNN lib to latest bits ZenDNN-2026-WW13
* ggml-zendnn : add braces to sgemm failure condition for consistency
Co-authored-by: Aaron Teo <taronaeo@gmail.com>
---------
Co-authored-by: Aaron Teo <taronaeo@gmail.com>
Copy file name to clipboardExpand all lines: docs/backend/ZenDNN.md
+5-4Lines changed: 5 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -57,13 +57,14 @@ ZenDNN is optimized for AMD EPYC™ processors and AMD Ryzen™ processors based
57
57
58
58
## Supported Operations
59
59
60
-
The ZenDNN backend currently accelerates **matrix multiplication (MUL_MAT)**operations only. Other operations are handled by the standard CPU backend.
60
+
The ZenDNN backend accelerates **matrix multiplication (MUL_MAT)**and **expert-based matrix multiplication (MUL_MAT_ID)** operations. Other operations are handled by the standard CPU backend.
| MUL_MAT | Support | Accelerated via ZenDNN LowOHA MatMul |
65
+
| MUL_MAT_ID | Support | Accelerated via ZenDNN LowOHA MatMul (MoE) |
65
66
66
-
*Note:* Since only MUL_MAT is accelerated, models will benefit most from ZenDNN when matrix multiplications dominate the computational workload (which is typical for transformer-based LLMs).
67
+
*Note:* Since MUL_MAT and MUL_MAT_ID are accelerated, models will benefit most from ZenDNN when matrix multiplications dominate the computational workload (which is typical for transformer-based LLMs and Mixture-of-Experts models).
67
68
68
69
## DataType Supports
69
70
@@ -181,7 +182,7 @@ For detailed profiling and logging options, refer to the [ZenDNN Logging Documen
181
182
182
183
## Known Issues
183
184
184
-
-**Limited operation support**: Currently only matrix multiplication (MUL_MAT) is accelerated via ZenDNN. Other operations fall back to the standard CPU backend.
185
+
-**Limited operation support**: Currently matrix multiplication (MUL_MAT) and expert-based matrix multiplication (MUL_MAT_ID) are accelerated via ZenDNN. Other operations fall back to the standard CPU backend. Future updates may expand supported operations.
185
186
-**BF16 support**: BF16 operations require AMD Zen 4 or Zen 5 architecture (EPYC 9004/9005 series). On older CPUs, operations will use FP32.
186
187
-**NUMA awareness**: For multi-socket systems, manual NUMA binding may be required for optimal performance.
187
188
@@ -216,4 +217,4 @@ Please add the **[ZenDNN]** prefix/tag in issues/PRs titles to help the ZenDNN-t
216
217
217
218
## TODO
218
219
219
-
- Expand operation support beyond MUL_MAT (attention operations, activations, etc.)
220
+
- Expand operation support beyond MUL_MAT and MUL_MAT_ID (attention operations, activations, etc.)
0 commit comments