Commit 135bc45
authored
fix: workaround duplicated AllGather for EP+FSDP2 (#173)
### What does this PR do?
Compute shared expert first to workaround the duplicated all-gather
issue in EP+FSDP2, which seems to be a bug in PyTorch FSDP2.
Before:
<img width="2230" height="302" alt="76591"
src="https://github.com/user-attachments/assets/f9f4e553-5678-4fa8-9fcf-77750ad165bf"
/>
After:
<img width="1756" height="188" alt="4480"
src="https://github.com/user-attachments/assets/749b572c-214b-4121-bfad-5ec2cea7f191"
/>1 parent e020de6 commit 135bc45
1 file changed
+14
-6
lines changedLines changed: 14 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
370 | 370 | | |
371 | 371 | | |
372 | 372 | | |
| 373 | + | |
| 374 | + | |
373 | 375 | | |
374 | 376 | | |
375 | 377 | | |
| |||
469 | 471 | | |
470 | 472 | | |
471 | 473 | | |
472 | | - | |
473 | | - | |
474 | | - | |
475 | | - | |
476 | | - | |
| 474 | + | |
| 475 | + | |
477 | 476 | | |
478 | | - | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
479 | 487 | | |
480 | 488 | | |
481 | 489 | | |
| |||
0 commit comments