Commit 4eadb95
authored
[NPUW]Enable MoE (GPT-OSS-20B) on NPU - HOST_ROUTED. (#33372)
### Details:
Enable GPT-OSS-20B on NPW.
**Prefill:**
Subgraphs isolation: 3 graphs per REPEAT (Router - Experts - Downstream)
Transformation for experts: E_NUM x SEQ_LEN -> 1 x chunk_size
(chunk_size are 16/32/64/128/256)
Execution: Process multiple tokens by iterating through experts
sequentially.
- For each expert: process all tokens assigned to it (potentially in
chunks)
- Use dynamic chunk sizing (256/128/64/32/16) based on token count
- Accumulate expert outputs into global buffer
**Decoding:**
Subgraphs isolation: 3 graphs per REPEAT (Router - Experts - Downstream)
Transformation for experts: E_NUM x 1 -> E_ACT_NUM x 1 -> unroll active
experts for better performance
Execution: Process one token with K active experts in a single batched
inference.
- Set all K expert weights at once (batch unrolling)
- Set K router scores (one per expert)
- Execute single inference that processes all K experts in on submission
**File structure:**
```
src/plugins/intel_npu/src/plugin/npuw/
├── moe/ # MoE module (NEW)
│ ├── moe_config.hpp # Configuration data structures
│ ├── moe_types.hpp # Type definitions (MoEIO)
│ ├── moe_infer_utils.hpp/cpp # Utility functions & RequestCache
│ ├── moe_resources.hpp/cpp # Resource management
│ └── moe_executor.hpp/cpp # Core execution logic
├── just_sync_infer_request.hpp/cpp # Integration point
├── compiled_model.hpp # Model metadata (CompiledModelDesc)
├── moe_transformations/ # MoE compilation and transformations
├── moe_transformation.hpp/cpp
└── moe_unroll_patterns.hpp/cpp
```
The model can be validate with below config:
```
{
"NPUW_DEVICES" : "NPU",
"MAX_PROMPT_LEN" : 1024,
"NPUW_MOE_TOKEN_CHUNK_SIZE" : 0,
"NPUW_LLM_GENERATE_MOE_HINT" : "HOST_ROUTED",
"NPUW_F16IC" : "YES",
"NPUW_LLM_OPTIMIZE_V_TENSORS" : "YES",
"NPU_TURBO" : "YES",
"NPUW_DUMP_SUBS" : "YES",
"NPUW_DUMP_IO" : "NO",
"NPUW_MOE_POOL_SIZE" : 8
}
```
### Tickets:
- *[EISW-190615](https://jira.devtools.intel.com/browse/EISW-190615)*
---------
Signed-off-by: intelgaoxiong <xiong.gao@intel.com>1 parent 367ad56 commit 4eadb95
File tree
34 files changed
+6488
-38
lines changed- src/plugins/intel_npu
- src
- al
- include/intel_npu
- config
- src/config
- plugin
- npuw
- moe_transformations
- moe
- partitioning
- online
- utils
- patterns
- src
- tests/unit
- npuw
34 files changed
+6488
-38
lines changedLines changed: 59 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
118 | 118 | | |
119 | 119 | | |
120 | 120 | | |
| 121 | + | |
| 122 | + | |
121 | 123 | | |
122 | 124 | | |
123 | 125 | | |
| |||
172 | 174 | | |
173 | 175 | | |
174 | 176 | | |
| 177 | + | |
175 | 178 | | |
176 | 179 | | |
177 | 180 | | |
| |||
279 | 282 | | |
280 | 283 | | |
281 | 284 | | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
282 | 341 | | |
283 | 342 | | |
284 | 343 | | |
| |||
Lines changed: 52 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
248 | 248 | | |
249 | 249 | | |
250 | 250 | | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
251 | 284 | | |
252 | 285 | | |
253 | 286 | | |
| |||
433 | 466 | | |
434 | 467 | | |
435 | 468 | | |
| 469 | + | |
436 | 470 | | |
437 | 471 | | |
438 | 472 | | |
| |||
511 | 545 | | |
512 | 546 | | |
513 | 547 | | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
514 | 566 | | |
515 | 567 | | |
516 | 568 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
33 | 33 | | |
34 | 34 | | |
35 | 35 | | |
| 36 | + | |
| 37 | + | |
36 | 38 | | |
37 | 39 | | |
38 | 40 | | |
| |||
69 | 71 | | |
70 | 72 | | |
71 | 73 | | |
| 74 | + | |
| 75 | + | |
72 | 76 | | |
73 | 77 | | |
74 | 78 | | |
| |||
Lines changed: 12 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
420 | 420 | | |
421 | 421 | | |
422 | 422 | | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
423 | 429 | | |
424 | 430 | | |
425 | 431 | | |
| |||
506 | 512 | | |
507 | 513 | | |
508 | 514 | | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
509 | 521 | | |
510 | 522 | | |
511 | 523 | | |
| |||
0 commit comments