Fix torch.export compatibility for Mixtral MoE models #40114

akacmazz · 2025-08-12T17:09:50Z

Replace data-dependent .nonzero() operation with static expert loop
Resolves GuardOnDataDependentSymNode error during torch.export
Maintains identical functionality while enabling export compatibility
Fixes issue introduced in PR Skip non-selected experts for mixtral and qwen2_moe #32429
Add tests for torch.export compatibility

What does this PR do?

This PR fixes a torch.export compatibility issue #38518 with Mixtral MoE models that was introduced in PR #32429.

Problem

The optimization in PR #32429 introduced a .nonzero() operation that creates data-dependent tensor shapes, causing torch.export to fail with:
torch.fx.experimental.symbolic_shapes.GuardOnDataDependentSymNode: Could not extract specialized integer from data-dependent expression

Solution

Replace the dynamic expert selection loop:

expert_hit = torch.greater(expert_mask.sum(dim=(-1, -2)), 0).nonzero()
for expert_idx in expert_hit:

With a static loop over all experts:
for expert_idx in range(self.num_experts):

Impact

✅ Enables torch.export compatibility for Mixtral models
✅ Maintains identical functionality (empty experts contribute 0 naturally)
✅ Minimal performance impact (same computation, different loop structure)
✅ Consistent with other MoE implementations (Jamba, DBRX)

Testing

Verified torch.export works without errors
Confirmed functionality preservation with identical outputs
Tested with various input configurations

Fixes torch.export compatibility issues reported for Mixtral-8x7B models.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

@Cyrilvallez
@ArthurZucker
@gante

- Replace data-dependent .nonzero() operation with static expert loop - Resolves GuardOnDataDependentSymNode error during torch.export - Maintains identical functionality while enabling export compatibility - Fixes issue introduced in PR huggingface#32429 - Add tests for torch.export compatibility

- Auto-generate modeling_mixtral.py with the same fix - Apply black formatting - Fix repository consistency check

ArthurZucker

Thanks for the PR, but the reason we cannot do that is because it is really a lot lot less efficient! I would recommend to rather have 2 paths, 1 for training one for inference, inference can just not loop at all and repeat the inputs

akacmazz · 2025-08-13T08:18:33Z

the reason we cannot do that is because it is really a lot lot less efficient! I would recommend to rather have 2 paths, 1 for training one for inference, inference can just not loop at all and repeat the inputs

Thx for the feedback, i will work on this

…ility - Training path: Keep efficient .nonzero() for performance - Inference path: Use static loop for torch.export compatibility - Add conditional check to skip empty experts in inference - Update tests to validate inference mode export - Addresses maintainer feedback on performance concerns

- Apply black formatting to fix code style - Fix import sorting with isort - Address CI code quality checks

- Fix import organization in modeling_mixtral.py - Fix import organization in modular_mixtral.py - Address ruff I001 import sorting warnings

- Remove manually edited modeling_mixtral.py - Auto-generate from modular_mixtral.py using proper tool - Ensure consistency between modular and generated files - Fix check_repository_consistency CI failure

- Remove 'if top_x.shape[0] == 0: continue' check that causes GuardOnDataDependentSymNode error - Empty expert tensors naturally contribute 0, no explicit check needed - Update test error message for clarity - Fixes tests_processors CI failure Co-authored-by: ArthurZucker <[email protected]>

…bility

github-actions · 2025-08-14T19:26:02Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: mixtral

akacmazz changed the title ~~Fix torch.export compatibility for Mixtral MoE models~~ Fix torch.export compatibility for Mixtral MoE models Aug 12, 2025

akacmazz added 2 commits August 12, 2025 20:53

Fix code quality - apply black and ruff formatting

e297322

Generate modeling_mixtral.py from modular_mixtral.py

c3e3c5e

- Auto-generate modeling_mixtral.py with the same fix - Apply black formatting - Fix repository consistency check

akacmazz closed this Aug 12, 2025

akacmazz reopened this Aug 12, 2025

akacmazz force-pushed the fix-mixtral-torch-export-compatibility branch 3 times, most recently from 9b41625 to c3e3c5e Compare August 12, 2025 20:23

ArthurZucker reviewed Aug 13, 2025

View reviewed changes

akacmazz force-pushed the fix-mixtral-torch-export-compatibility branch from 952a181 to 0aa9de7 Compare August 13, 2025 08:51

akacmazz and others added 9 commits August 13, 2025 12:23

Fix code quality - apply black and isort formatting

6c22f45

- Apply black formatting to fix code style - Fix import sorting with isort - Address CI code quality checks

Merge branch 'main' into fix-mixtral-torch-export-compatibility

26a7358

Merge branch 'main' into fix-mixtral-torch-export-compatibility

f64d522

Merge branch 'main' into fix-mixtral-torch-export-compatibility

7e65ac5

Fix ruff import sorting issues

9604511

- Fix import organization in modeling_mixtral.py - Fix import organization in modular_mixtral.py - Address ruff I001 import sorting warnings

Fix repository consistency - auto-generate modeling_mixtral.py

068f99d

- Remove manually edited modeling_mixtral.py - Auto-generate from modular_mixtral.py using proper tool - Ensure consistency between modular and generated files - Fix check_repository_consistency CI failure

Merge branch 'main' into fix-mixtral-torch-export-compatibility

ecf56ba

Merge branch 'huggingface:main' into fix-mixtral-torch-export-compati…

26ea571

…bility

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix torch.export compatibility for Mixtral MoE models #40114

Fix torch.export compatibility for Mixtral MoE models #40114

Uh oh!

akacmazz commented Aug 12, 2025 •

edited

Loading

Uh oh!

ArthurZucker left a comment

Uh oh!

akacmazz commented Aug 13, 2025

Uh oh!

github-actions bot commented Aug 14, 2025

Uh oh!

Uh oh!

Fix torch.export compatibility for Mixtral MoE models #40114

Are you sure you want to change the base?

Fix torch.export compatibility for Mixtral MoE models #40114

Uh oh!

Conversation

akacmazz commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

akacmazz commented Aug 13, 2025

Uh oh!

github-actions bot commented Aug 14, 2025

Uh oh!

Uh oh!

akacmazz commented Aug 12, 2025 •

edited

Loading