You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+15-19Lines changed: 15 additions & 19 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -272,49 +272,45 @@ For gradient-based optimization, dataset splitting, and all other methods, see t
272
272
273
273
Stage Advantage decomposes long-horizon tasks into semantic stages and provides stage-aware advantage signals for policy training. It addresses the numerical instability of prior non-stage approaches by computing advantage as progress differentials within each stage, yielding smoother and more stable supervision.
**Stage 0 — GT Data Labeling**: Compute advantage values and discretize into `task_index` labels.
283
+
**Step 0 — Annotate `stage_progress_gt`** (manual, no code provided): For each episode, annotate start/end timestamps and subtask split points, then compute per-frame `stage_progress_gt` (linear progress 0→1 within each subtask) and write it into the parquet files.
284
+
285
+
**Step 1 — Train Advantage Estimator**: Fine-tune a pi0-based model to predict advantage from observations.
uv run python scripts/train_pytorch.py ADVANTAGE_TORCH_KAI0_FLATTEN_FOLD --exp_name=run1 --save_interval 10000
290
289
```
291
290
292
-
For batch labeling across multiple dataset variants, see `stage_advantage/annotation/gt_labeling.sh`.
293
-
294
-
**Stage 1 — Train Advantage Estimator**: Fine-tune a pi0-based model to predict advantage from observations.
291
+
**Step 2 — Predict Advantage**: Use the trained estimator to label datasets with `absolute_advantage` and `relative_advantage`.
295
292
296
293
```bash
297
-
uv run python scripts/train_pytorch.py ADVANTAGE_TORCH_KAI0_FLATTEN_FOLD --exp_name=run1 --save_interval 10000
294
+
uv run python stage_advantage/annotation/eval.py Task-A KAI0 /path/to/dataset
298
295
```
299
296
300
-
For a ready-to-use script with environment setup (conda/venv activation, DDP configuration) and automatic log management, see `stage_advantage/annotation/train_estimator.sh`.
301
-
302
-
**Stage 2 — Advantage Estimation on New Data**: Use the trained estimator to label datasets with predicted advantage values.
297
+
**Step 3 — Discretize Advantage**: Bin predicted advantages into positive/negative `task_index` labels.
303
298
304
299
```bash
305
-
uv run python stage_advantage/annotation/eval.py Task-A KAI0 /path/to/dataset
This directory contains **Stage 0** (GT labeling with `gt_label.py` / `gt_labeling.sh`), **Stage 1** (advantage estimator training via `scripts/train_pytorch.py`), and **Stage 2** (advantage estimation on new data via `eval.py`). All commands below assume you are at the **repository root** unless noted. Full pipeline and options are in the [parent README](../README.md).
3
+
This directory contains **Step 1** (advantage estimator training via `scripts/train_pytorch.py`), **Step 2** (advantage prediction on data via `eval.py`), and **Step 3** (discretize advantages into positive/negative via `discretize_advantage.py`). All commands below assume you are at the **repository root** unless noted. Full pipeline and options are in the [parent README](../README.md).
4
4
5
5
### Quick Start
6
6
7
7
```bash
8
-
# Step 1: Label a dataset with advantage-based task_index (GT labels from progress)
9
-
# Edit DATA_PATH in gt_labeling.sh, then from repo root:
0 commit comments