MoE speedrun: Mixtral load balance + v4 smoke preset by pc0618 · Pull Request #3094 · marin-community/marin

pc0618 · 2026-02-27T08:13:53Z

Summary

Mixtral: add equilibrium-bias load balancing + router fp32 option.
Speedrun: add olmoe_s preset for v4 smoke (conservative cross_entropy_block_size=512).
Speedrun: honor --seq-len for training by wiring train_seq_len.
Mixtral HF export: omit router_bias from state dict for compatibility.
Grugformer MoE: avoid v4 auto-axis sharding failures.
Speedrun/W&B: refresh logging defaults + archive/profiling flags for grugformer_moe runs.

Smoke test (Levanter, v4-8)

Command:
uv run python -m marin.run.ray_run --cluster infra/marin-us-central2.yaml --tpu v4-8 --env_vars WANDB_MODE=online -- python experiments/speedrun/olmoe_1b7b_nemotron_40b.py --model olmoe_s --tpu-type v4-8 --global-batch-size 32 --seq-len 1024 --num-train-steps 20 --dataset nemotron_cc --run-suffix pr-smoke-v4-8-b32-s1024-t20-20260227-000049
Artifacts:
- gs://marin-us-central2/checkpoints/speedrun/pr-smoke-v4-8-b32-s1024-t20-20260227-000049-f021f6/speedrun_results.json
- gs://marin-us-central2/experiments/olmoe_1b7b_nemotron_40b-eec246.json
- W&B: https://wandb.ai/marin-community/marin/runs/pr-smoke-v4-8-b32-s1024-t20-20260227-000049-f021f6
Result highlights: 20 steps, global_bs=32, seq_len=1024, model_size=158.69M params; MFU (model_flops / peak_hw_flops) ≈ 0.68%.

- Document grugformer MoE entrypoints in docs/reports/grug-archive.md - Add CLI switches for profiling, jaxpr/HLO artifact logging, and perfetto link generation - Default to legacy axis resources + non-explicit mesh axes for higher MFU parity with levanter MoE runs - Use cached Nemotron Llama3 tokenized components in olmoe_1b7b speedrun and allow CE block-size override

pc0618 added 7 commits February 26, 2026 22:00

Sync OLMoE/Mixtral sweep relaunch and W&B MoE logging defaults

4b505af

grugformer_moe: avoid v4 auto-axis sharding failures

bd4b91f

mixtral: add equilibrium bias load balancing + router fp32

6ed84a4

speedrun: add olmoe_s preset for v4 smoke test

21de485

mixtral: omit router_bias from HF state dict

f5982ed

speedrun: honor --seq-len in training

da2f3e3

pc0618 mentioned this pull request Feb 27, 2026

Ray TPU jobs: portability + safer CLI #3095

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MoE speedrun: Mixtral load balance + v4 smoke preset#3094

MoE speedrun: Mixtral load balance + v4 smoke preset#3094
pc0618 wants to merge 7 commits intomainfrom
pc0618/pr-moe-speedrun-wandb-mfu

pc0618 commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pc0618 commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant