MoE speedrun: Mixtral load balance + v4 smoke preset#3094
Open
MoE speedrun: Mixtral load balance + v4 smoke preset#3094
Conversation
- Document grugformer MoE entrypoints in docs/reports/grug-archive.md - Add CLI switches for profiling, jaxpr/HLO artifact logging, and perfetto link generation - Default to legacy axis resources + non-explicit mesh axes for higher MFU parity with levanter MoE runs - Use cached Nemotron Llama3 tokenized components in olmoe_1b7b speedrun and allow CE block-size override
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
olmoe_spreset for v4 smoke (conservativecross_entropy_block_size=512).--seq-lenfor training by wiringtrain_seq_len.router_biasfrom state dict for compatibility.Smoke test (Levanter, v4-8)
uv run python -m marin.run.ray_run --cluster infra/marin-us-central2.yaml --tpu v4-8 --env_vars WANDB_MODE=online -- python experiments/speedrun/olmoe_1b7b_nemotron_40b.py --model olmoe_s --tpu-type v4-8 --global-batch-size 32 --seq-len 1024 --num-train-steps 20 --dataset nemotron_cc --run-suffix pr-smoke-v4-8-b32-s1024-t20-20260227-000049gs://marin-us-central2/checkpoints/speedrun/pr-smoke-v4-8-b32-s1024-t20-20260227-000049-f021f6/speedrun_results.jsongs://marin-us-central2/experiments/olmoe_1b7b_nemotron_40b-eec246.json