Skip to content

Conversation

rakkit
Copy link
Contributor

@rakkit rakkit commented Sep 1, 2025

we can set DEBUG_FORCE_LOAD_BALANCED=1 to force each experts get same amount of tokens.

reprodue: DEBUG_FORCE_LOAD_BALANCED=1 CONFIG_FILE="./torchtitan/models/deepseek_v3/train_configs/debug_model.toml" NGPU=4 ./run_train.sh --compile.enable

here is test on 8layers, 8 activate and 64 total experts. Green one is vanilla one and purple one is with force load balance
image

image

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant