Fake balanced routing in MoE #1670

rakkit · 2025-09-01T10:38:03Z

we can set DEBUG_FORCE_LOAD_BALANCED=1 to force each experts get same amount of tokens.

reprodue: DEBUG_FORCE_LOAD_BALANCED=1 CONFIG_FILE="./torchtitan/models/deepseek_v3/train_configs/debug_model.toml" NGPU=4 ./run_train.sh --compile.enable

here is test on 8layers, 8 activate and 64 total experts. Green one is vanilla one and purple one is with force load balance

… amount of token

we can set DEBUG_FORCE_LOAD_BALANCED=1 to force each experts get same…

2bc8578

… amount of token

rakkit requested review from tianyu-l, fegin, wwwjn and wconstab as code owners September 1, 2025 10:38

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fake balanced routing in MoE #1670

Fake balanced routing in MoE #1670

Uh oh!

rakkit commented Sep 1, 2025

Uh oh!

Uh oh!

Fake balanced routing in MoE #1670

Are you sure you want to change the base?

Fake balanced routing in MoE #1670

Uh oh!

Conversation

rakkit commented Sep 1, 2025

Uh oh!

Uh oh!