You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[UPDATE]: add Train-Deploy Alignment module to README
Mark Train-Deploy Alignment as released, add update log entry,
check off to-do item, and replace Coming Soon placeholder with
full content (data augmentation, DAgger, inference quick start).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: README.md
+53-10Lines changed: 53 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,7 +21,7 @@
21
21
22
22
-**[Model Arithmetic](#model-arithmetic)**: A weight-space merging strategy that combines models trained on different data subsets, efficiently capturing diverse knowledge without architectural complexity. **[Released]**
23
23
-**[Stage Advantage](#stage-advantage)**: A stage-aware advantage estimator that provides stable, dense progress signals for policy training. **[Released]**
24
-
-**[Train-Deploy Alignment](#train-deploy-alignment-coming-soon)**: Bridges the distribution gap via spatio-temporal augmentation, heuristic DAgger corrections, and temporal chunk-wise smoothing. **[Coming Soon]**
24
+
-**[Train-Deploy Alignment](#train-deploy-alignment)**: Bridges the distribution gap via spatio-temporal augmentation, heuristic DAgger corrections, and temporal chunk-wise smoothing. **[Released]**
25
25
26
26
χ₀ enables two sets of dual-arm robots to collaboratively orchestrate long-horizon garment manipulation — flattening, folding, and hanging — surpassing the state-of-the-art $\pi_{0.5}$ baseline by approximately 250% in success rate, with `only 20 hours of data and 8 A100 GPUs`.
-[Feb 15 2026] Release of the **Train-Deploy Alignment** module: data augmentation (time scaling, space mirroring), DAgger data collection, inference with temporal smoothing/ensembling and RTC, and HDF5-to-LeRobot conversion.
57
58
-[Feb 14 2026] Release of the **Stage Advantage** module: advantage estimator training, evaluation, GT labeling, and AWBC training pipeline.
58
59
-[Feb 10 2026] Initial release of the **Model Arithmetic** module with support for both JAX and PyTorch checkpoints (not tested thoroughly).
59
60
-[Feb 10 2026] χ₀ paper released.
@@ -210,8 +211,8 @@ Checkpoints are written to the config’s checkpoint directory. You can then use
210
211
-[x] kai0 oracle: training and inference code with non-advantage data of three tasks
211
212
-[x] Model Arithmetic: code of different baselines for weight-space interpolation
212
213
-[x] Stage Advantage: code, data (advantage labels), and checkpoints
213
-
-[ ] HuggingFace & ModelScope: upload Stage Advantage data and checkpoints — **Feb 14**
-[ ]HuggingFace & ModelScope: upload Stage Advantage data and checkpoints
215
216
216
217
## Model Arithmetic
217
218
@@ -315,14 +316,56 @@ For a ready-to-use script with environment setup and automatic log management, s
315
316
316
317
For the full pipeline details, configuration instructions, and all parameters, see [`stage_advantage/README.md`](stage_advantage/README.md).
317
318
318
-
## Train-Deploy Alignment (Coming Soon)
319
+
## Train-Deploy Alignment
319
320
320
-
Train-Deploy Alignment bridges the distribution gap between training and real-world deployment through:
321
-
-**Spatio-temporal augmentation**: Data augmentation including space mirroring and time scaling for dual-arm setups.
322
-
-**Heuristic DAgger corrections**: Interactive on-robot data collection for iterative policy improvement.
323
-
-**Temporal chunk-wise smoothing**: Smoothed action execution to reduce jitter during deployment.
321
+
Train-Deploy Alignment bridges the distribution gap between training and real-world deployment through three sub-modules:
324
322
325
-
**This module is currently under refinement and will be released soon.**
323
+
-**Data Augmentation** (`train_deploy_alignment/data_augment/`): Time scaling (frame extraction at configurable rates), space mirroring (left/right arm swap + video flip), dataset merging, and HDF5-to-LeRobot format conversion.
324
+
-**DAgger** (`train_deploy_alignment/dagger/`): Policy-in-the-loop data collection for both Agilex Piper and ARX X5 platforms. Operators run inference, switch to DAgger mode for human corrections, and save episodes (HDF5 + optional videos + intervention labels).
325
+
-**Inference** (`train_deploy_alignment/inference/`): Deployment code for Agilex and ARX robots with multiple execution modes — synchronous, temporal smoothing, temporal ensembling, and **RTC (real-time chunking)**. Uses a two-machine setup (GPU policy server + robot IPC client).
For full setup instructions (IPC environment, CAN, ROS/ROS2, platform-specific details), see [`train_deploy_alignment/README.md`](train_deploy_alignment/README.md).
0 commit comments