Skip to content

Commit 9d350ca

Browse files
ChonghaoSimaclaude
andcommitted
[UPDATE]: add Train-Deploy Alignment module to README
Mark Train-Deploy Alignment as released, add update log entry, check off to-do item, and replace Coming Soon placeholder with full content (data augmentation, DAgger, inference quick start). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent b7f3c49 commit 9d350ca

File tree

1 file changed

+53
-10
lines changed

1 file changed

+53
-10
lines changed

README.md

Lines changed: 53 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121

2222
- **[Model Arithmetic](#model-arithmetic)**: A weight-space merging strategy that combines models trained on different data subsets, efficiently capturing diverse knowledge without architectural complexity. **[Released]**
2323
- **[Stage Advantage](#stage-advantage)**: A stage-aware advantage estimator that provides stable, dense progress signals for policy training. **[Released]**
24-
- **[Train-Deploy Alignment](#train-deploy-alignment-coming-soon)**: Bridges the distribution gap via spatio-temporal augmentation, heuristic DAgger corrections, and temporal chunk-wise smoothing. **[Coming Soon]**
24+
- **[Train-Deploy Alignment](#train-deploy-alignment)**: Bridges the distribution gap via spatio-temporal augmentation, heuristic DAgger corrections, and temporal chunk-wise smoothing. **[Released]**
2525

2626
χ₀ enables two sets of dual-arm robots to collaboratively orchestrate long-horizon garment manipulation — flattening, folding, and hanging — surpassing the state-of-the-art $\pi_{0.5}$ baseline by approximately 250% in success rate, with `only 20 hours of data and 8 A100 GPUs`.
2727

@@ -47,13 +47,14 @@ https://github.com/user-attachments/assets/3f5f0c48-ff3f-4b9b-985b-59ad0b2ea97c
4747
- [Workflow](#workflow)
4848
- [Quick Start](#quick-start)
4949
- [Stage Advantage](#stage-advantage)
50-
- [Train-Deploy Alignment (Coming Soon)](#train-deploy-alignment-coming-soon)
50+
- [Train-Deploy Alignment](#train-deploy-alignment)
5151
- [Citation](#licenseandcitation)
5252
- [Troubleshooting](#troubleshooting)
5353
- [Links and Community](#links-and-community)
5454

5555
## Update
5656

57+
- [Feb 15 2026] Release of the **Train-Deploy Alignment** module: data augmentation (time scaling, space mirroring), DAgger data collection, inference with temporal smoothing/ensembling and RTC, and HDF5-to-LeRobot conversion.
5758
- [Feb 14 2026] Release of the **Stage Advantage** module: advantage estimator training, evaluation, GT labeling, and AWBC training pipeline.
5859
- [Feb 10 2026] Initial release of the **Model Arithmetic** module with support for both JAX and PyTorch checkpoints (not tested thoroughly).
5960
- [Feb 10 2026] χ₀ paper released.
@@ -210,8 +211,8 @@ Checkpoints are written to the config’s checkpoint directory. You can then use
210211
- [x] kai0 oracle: training and inference code with non-advantage data of three tasks
211212
- [x] Model Arithmetic: code of different baselines for weight-space interpolation
212213
- [x] Stage Advantage: code, data (advantage labels), and checkpoints
213-
- [ ] HuggingFace & ModelScope: upload Stage Advantage data and checkpoints — **Feb 14**
214-
- [ ] Train-Deploy Alignment — **Feb 14**
214+
- [x] Train-Deploy Alignment: data augmentation, DAgger, inference (temporal smoothing, ensembling, RTC)
215+
- [ ] HuggingFace & ModelScope: upload Stage Advantage data and checkpoints
215216

216217
## Model Arithmetic
217218

@@ -315,14 +316,56 @@ For a ready-to-use script with environment setup and automatic log management, s
315316

316317
For the full pipeline details, configuration instructions, and all parameters, see [`stage_advantage/README.md`](stage_advantage/README.md).
317318

318-
## Train-Deploy Alignment (Coming Soon)
319+
## Train-Deploy Alignment
319320

320-
Train-Deploy Alignment bridges the distribution gap between training and real-world deployment through:
321-
- **Spatio-temporal augmentation**: Data augmentation including space mirroring and time scaling for dual-arm setups.
322-
- **Heuristic DAgger corrections**: Interactive on-robot data collection for iterative policy improvement.
323-
- **Temporal chunk-wise smoothing**: Smoothed action execution to reduce jitter during deployment.
321+
Train-Deploy Alignment bridges the distribution gap between training and real-world deployment through three sub-modules:
324322

325-
**This module is currently under refinement and will be released soon.**
323+
- **Data Augmentation** (`train_deploy_alignment/data_augment/`): Time scaling (frame extraction at configurable rates), space mirroring (left/right arm swap + video flip), dataset merging, and HDF5-to-LeRobot format conversion.
324+
- **DAgger** (`train_deploy_alignment/dagger/`): Policy-in-the-loop data collection for both Agilex Piper and ARX X5 platforms. Operators run inference, switch to DAgger mode for human corrections, and save episodes (HDF5 + optional videos + intervention labels).
325+
- **Inference** (`train_deploy_alignment/inference/`): Deployment code for Agilex and ARX robots with multiple execution modes — synchronous, temporal smoothing, temporal ensembling, and **RTC (real-time chunking)**. Uses a two-machine setup (GPU policy server + robot IPC client).
326+
327+
### Quick Start
328+
329+
**Data Augmentation — Time scaling:**
330+
331+
```bash
332+
python train_deploy_alignment/data_augment/time_scaling.py \
333+
--src_path /path/to/source --tgt_path /path/to/extracted --repo_id extracted_dataset \
334+
--extraction_factor 2
335+
```
336+
337+
**Data Augmentation — Space mirroring (mirror + merge):**
338+
339+
```bash
340+
python train_deploy_alignment/data_augment/space_mirroring.py full \
341+
--src-path /path/to/original --mirror-path /path/to/mirrored --merge-path /path/to/merged \
342+
--repo-id my_dataset
343+
```
344+
345+
**DAgger — Agilex:** Start the policy server on the GPU host, then on the IPC:
346+
347+
```bash
348+
conda activate kai0_inference
349+
python train_deploy_alignment/dagger/agilex/agilex_openpi_dagger_collect.py \
350+
--host <gpu_host_ip> --port 8000 --ctrl_type joint --use_temporal_smoothing --chunk_size 50 \
351+
--dataset_name <your_dataset_name>
352+
```
353+
354+
**Inference — Agilex (temporal smoothing):** Start the policy server on the GPU host, then on the IPC:
355+
356+
```bash
357+
conda activate kai0_inference
358+
python inference/agilex_inference_openpi_temporal_smoothing.py \
359+
--host <gpu_host_ip> --port 8000 --ctrl_type joint --use_temporal_smoothing --chunk_size 50
360+
```
361+
362+
**Inference — ARX (RTC mode):** Start the policy server with an RTC config, then on the IPC:
363+
364+
```bash
365+
python inference/arx_openpi_inference_rtc.py --host <gpu_host_ip> --port 8000 --rtc_mode --chunk_size 50
366+
```
367+
368+
For full setup instructions (IPC environment, CAN, ROS/ROS2, platform-specific details), see [`train_deploy_alignment/README.md`](train_deploy_alignment/README.md).
326369

327370
## License and Citation
328371

0 commit comments

Comments
 (0)