Skip to content

Commit 1293dbe

Browse files
committed
fix small issues
1 parent 4979722 commit 1293dbe

File tree

2 files changed

+1
-2
lines changed

2 files changed

+1
-2
lines changed

_posts/2026-02-16-musclemimic.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -136,6 +136,7 @@ Human motor control emerges from hundreds of muscles coordinating in real time,
136136

137137
## What It Looks Like
138138

139+
In each video, the <span style="color:#7b68ee;">**blue/purple**</span> character shows the reference motion, while the other character shows the policy's inference result. The full-body locomotion tasks and the bimanual manipulation tasks are each driven by a separate policy.
139140

140141
<div class="video-grid l-page">
141142
<figure>
@@ -290,8 +291,6 @@ The penalty $P_t = \max(-1,\; -\sum \lambda_p C_p)$ regularizes action bounds vi
290291

291292
MuscleMimic is implemented as a JAX-based framework extending LocoMuJoCo<d-cite key="al2023locomujoco"></d-cite> with native MuJoCo Warp support for GPU-accelerated simulation. We train across 8,192 parallel environments for 4.9 billion timesteps using the Muon optimizer<d-cite key="jordan2024muon"></d-cite> for linear layers and Adam<d-cite key="DBLP:journals/corr/KingmaB14"></d-cite> for biases and normalization, which yields significantly faster convergence than AdamW<d-cite key="DBLP:conf/iclr/LoshchilovH19"></d-cite>. For training on diverse motion datasets, we use the KINESIS dataset<d-cite key="simos2025kinesis"></d-cite> (a curated subset of AMASS<d-cite key="mahmood2019amass"></d-cite>) and progressively scale to more dynamic motions including Embody3D<d-cite key="embody3d"></d-cite>.
292293

293-
For large scale training, we use the Muon optimizer<d-cite key="jordan2024muon"></d-cite> for linear layers and Adam<d-cite key="DBLP:journals/corr/KingmaB14"></d-cite> for biases and normalization, which yields significantly faster and more stable convergence than AdamW<d-cite key="liu2025muon"></d-cite>.
294-
295294
**Single-epoch updates work best.** With massively parallel GPU simulation, we can collect fresh data cheaply, so single-epoch updates ($E = 1$) achieve superior asymptotic performance while avoiding pathologies from aggressive sample reuse: expert collapse in Soft MoE routing and severe distribution shift with KL divergence spikes orders of magnitude above the stable baseline.
296295

297296
{% include figure.html path="assets/img/musclemimic/epoch_ablation.png" alt="Effect of gradient epochs on training" caption="Effect of gradient epochs ($E$) on training stability. We compare $E=1$ (truly on-policy), $E=3$, and $E=10$ (aggressive sample reuse). (A) Early training (first 30M steps): higher $E$ accelerates initial learning. (B) Full training trajectory: $E=1$ achieves superior asymptotic performance. (C) KL divergence (log scale): $E>1$ exhibits catastrophic distribution shift with spikes exceeding $10^{10}$, whereas $E=1$ remains stable below $10^{-1}$." %}
1021 Bytes
Loading

0 commit comments

Comments
 (0)