Skip to content

Commit f1ac841

Browse files
z-vishaltaronaeo
andauthored
ggml-zendnn : add MUL_MAT_ID op support for MoE models (#21315)
* ggml-zendnn : add MUL_MAT_ID op support for MoE models - Add MUL_MAT_ID op acceleration for Mixture-of-Experts models - MUL_MAT_ID op fallback to CPU backend if total experts > 32 - Point ZenDNN lib to latest bits ZenDNN-2026-WW13 * ggml-zendnn : add braces to sgemm failure condition for consistency Co-authored-by: Aaron Teo <taronaeo@gmail.com> --------- Co-authored-by: Aaron Teo <taronaeo@gmail.com>
1 parent b069b10 commit f1ac841

File tree

5 files changed

+2959
-7219
lines changed

5 files changed

+2959
-7219
lines changed

docs/backend/ZenDNN.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -57,13 +57,14 @@ ZenDNN is optimized for AMD EPYC™ processors and AMD Ryzen™ processors based
5757

5858
## Supported Operations
5959

60-
The ZenDNN backend currently accelerates **matrix multiplication (MUL_MAT)** operations only. Other operations are handled by the standard CPU backend.
60+
The ZenDNN backend accelerates **matrix multiplication (MUL_MAT)** and **expert-based matrix multiplication (MUL_MAT_ID)** operations. Other operations are handled by the standard CPU backend.
6161

6262
| Operation | Status | Notes |
6363
|:-------------|:-------:|:----------------------------------------------:|
6464
| MUL_MAT | Support | Accelerated via ZenDNN LowOHA MatMul |
65+
| MUL_MAT_ID | Support | Accelerated via ZenDNN LowOHA MatMul (MoE) |
6566

66-
*Note:* Since only MUL_MAT is accelerated, models will benefit most from ZenDNN when matrix multiplications dominate the computational workload (which is typical for transformer-based LLMs).
67+
*Note:* Since MUL_MAT and MUL_MAT_ID are accelerated, models will benefit most from ZenDNN when matrix multiplications dominate the computational workload (which is typical for transformer-based LLMs and Mixture-of-Experts models).
6768

6869
## DataType Supports
6970

@@ -181,7 +182,7 @@ For detailed profiling and logging options, refer to the [ZenDNN Logging Documen
181182

182183
## Known Issues
183184

184-
- **Limited operation support**: Currently only matrix multiplication (MUL_MAT) is accelerated via ZenDNN. Other operations fall back to the standard CPU backend.
185+
- **Limited operation support**: Currently matrix multiplication (MUL_MAT) and expert-based matrix multiplication (MUL_MAT_ID) are accelerated via ZenDNN. Other operations fall back to the standard CPU backend. Future updates may expand supported operations.
185186
- **BF16 support**: BF16 operations require AMD Zen 4 or Zen 5 architecture (EPYC 9004/9005 series). On older CPUs, operations will use FP32.
186187
- **NUMA awareness**: For multi-socket systems, manual NUMA binding may be required for optimal performance.
187188

@@ -216,4 +217,4 @@ Please add the **[ZenDNN]** prefix/tag in issues/PRs titles to help the ZenDNN-t
216217

217218
## TODO
218219

219-
- Expand operation support beyond MUL_MAT (attention operations, activations, etc.)
220+
- Expand operation support beyond MUL_MAT and MUL_MAT_ID (attention operations, activations, etc.)

docs/ops.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ Legend:
6868
| MEAN ||||||||||||
6969
| MUL ||||| 🟡 |||||||
7070
| MUL_MAT | 🟡 | 🟡 | 🟡 | 🟡 | 🟡 | 🟡 | 🟡 | 🟡 | 🟡 | 🟡 | 🟡 |
71-
| MUL_MAT_ID || 🟡 ||| 🟡 | 🟡 | 🟡 ||| ||
71+
| MUL_MAT_ID || 🟡 ||| 🟡 | 🟡 | 🟡 ||| 🟡 ||
7272
| NEG |||| 🟡 |||| 🟡 ||||
7373
| NORM |||||||| 🟡 ||||
7474
| OPT_STEP_ADAMW ||||||||||||

0 commit comments

Comments
 (0)