Experiment: OLMoE Quantile Balancing (MoE Odyssey) on Nemotron

## Reference
- MoE Odyssey #6: Optimal Allocation for Equilibrium (Quantile Balancing):
  https://datasets.osmarks.net/kexue/site/11619-MoE-Odyssey-6.-Optimal-Allocation-for-Equilibrium.html

## Objective
Run and evaluate a **true Quantile Balancing (QB)** load-balancing experiment for OLMoE, replacing the previous equilibrium proxy that produced misleading aux dynamics.

## Context
We previously observed an unhealthy pattern in the old stab4-equilibrium implementation:
- `train/equilibrium_lb_bias_loss` became strongly negative and distorted total `train/loss` interpretation.
- MoE routing could remain collapsed (high `moe/load_violation_max`) despite apparent loss improvement.

A corrected implementation is now in-tree:
- Alternating quantile QB target (`alpha`, `beta`, `b*`) per MoE Odyssey
- Bias-target objective (`0.5 * scale * ||b - stop_grad(b*)||^2`) instead of signed linear surrogate
- Config knob: `equilibrium_lb_iterations`

## Experiment Plan
1. Run OLMoE-M `stab4` on `nemotron_cc` with:
   - TPU: `v5litepod-64`
   - Seq len: `4096`
   - Global batch: `128`
   - Token target: `40B`
   - LRs: `7.5e-4, 1e-3, 2e-3, 3e-3`
   - `equilibrium_lb_loss_scale=0.01`
   - `equilibrium_lb_iterations=5`
2. Compare against prior stability baselines (`olmoe_m`, `olmoe_m_stab3`) on same dataset/compute envelope.
3. Evaluate first 2k, 10k, and 50k step windows before deciding full-run continuation for all LRs.

## Primary Metrics
- Training behavior:
  - `train/loss`
  - `train/router_z_loss`
  - `train/equilibrium_lb_bias_loss` (should be well-behaved, non-pathological)
- MoE balance:
  - `moe/load_violation_max`
  - `moe/equilibrium_rel_load_violation_max`
  - `moe/equilibrium_quantile_prob_mean`
  - per-layer expert load histograms / routing entropy
- Reliability:
  - no OOM / vmem failures
  - stable throughput / no chronic runtime-env failures

## Success Criteria
- No pathological loss artifact from equilibrium term (no large negative drift masking CE behavior).
- MoE load balance trends improve vs pre-fix run (downward or clearly lower violation trajectory).
- Stable training + W&B logging online for sweep runs.
- At least one LR candidate demonstrates healthy early-phase convergence and routing behavior.

## Current Run Artifact
- Running fixed sweep submission: `raysubmit_sF79814LVXC1Jxe6`
- Current W&B run (first LR):
  https://wandb.ai/marin-community/olmoe_m/runs/s4096_b128_euw4_v5l-174603-d4ba1c

## Tasks
- [ ] Compare QB-fixed stab4 vs prior stab3/base on matched intervals.
- [ ] Summarize whether QB should remain in stab4 default for future 40B-token sweeps.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiment: OLMoE Quantile Balancing (MoE Odyssey) on Nemotron #3124

Reference

Objective

Context

Experiment Plan

Primary Metrics

Success Criteria

Current Run Artifact

Tasks

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Experiment: OLMoE Quantile Balancing (MoE Odyssey) on Nemotron #3124

Description

Reference

Objective

Context

Experiment Plan

Primary Metrics

Success Criteria

Current Run Artifact

Tasks

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions