-
Notifications
You must be signed in to change notification settings - Fork 30.2k
Fix torch.export compatibility for Mixtral MoE models #40114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix torch.export compatibility for Mixtral MoE models #40114
Conversation
- Replace data-dependent .nonzero() operation with static expert loop - Resolves GuardOnDataDependentSymNode error during torch.export - Maintains identical functionality while enabling export compatibility - Fixes issue introduced in PR huggingface#32429 - Add tests for torch.export compatibility
- Auto-generate modeling_mixtral.py with the same fix - Apply black formatting - Fix repository consistency check
9b41625
to
c3e3c5e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, but the reason we cannot do that is because it is really a lot lot less efficient! I would recommend to rather have 2 paths, 1 for training one for inference, inference can just not loop at all and repeat the inputs
Thx for the feedback, i will work on this |
…ility - Training path: Keep efficient .nonzero() for performance - Inference path: Use static loop for torch.export compatibility - Add conditional check to skip empty experts in inference - Update tests to validate inference mode export - Addresses maintainer feedback on performance concerns
952a181
to
0aa9de7
Compare
- Apply black formatting to fix code style - Fix import sorting with isort - Address CI code quality checks
- Fix import organization in modeling_mixtral.py - Fix import organization in modular_mixtral.py - Address ruff I001 import sorting warnings
- Remove manually edited modeling_mixtral.py - Auto-generate from modular_mixtral.py using proper tool - Ensure consistency between modular and generated files - Fix check_repository_consistency CI failure
- Remove 'if top_x.shape[0] == 0: continue' check that causes GuardOnDataDependentSymNode error - Empty expert tensors naturally contribute 0, no explicit check needed - Update test error message for clarity - Fixes tests_processors CI failure Co-authored-by: ArthurZucker <[email protected]>
[For maintainers] Suggested jobs to run (before merge) run-slow: mixtral |
What does this PR do?
This PR fixes a torch.export compatibility issue #38518 with Mixtral MoE models that was introduced in PR #32429.
Problem
The optimization in PR #32429 introduced a .nonzero() operation that creates data-dependent tensor shapes, causing torch.export to fail with:
torch.fx.experimental.symbolic_shapes.GuardOnDataDependentSymNode: Could not extract specialized integer from data-dependent expression
Solution
Replace the dynamic expert selection loop:
expert_hit = torch.greater(expert_mask.sum(dim=(-1, -2)), 0).nonzero()
for expert_idx in expert_hit:
With a static loop over all experts:
for expert_idx in range(self.num_experts):
Impact
Testing
Fixes torch.export compatibility issues reported for Mixtral-8x7B models.
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.
@Cyrilvallez
@ArthurZucker
@gante