OpenDriveLab
diff --git a/‎README.md‎
Lines changed: 15 additions & 19 deletions b/‎README.md‎
Lines changed: 15 additions & 19 deletions
diff --git a/‎stage_advantage/README.md‎
Lines changed: 134 additions & 152 deletions b/‎stage_advantage/README.md‎
Lines changed: 134 additions & 152 deletions
diff --git a/‎stage_advantage/annotation/README.md‎
Lines changed: 14 additions & 19 deletions b/‎stage_advantage/annotation/README.md‎
Lines changed: 14 additions & 19 deletions
diff --git a/‎stage_advantage/annotation/gt_label.py‎ ‎…ntage/annotation/discretize_advantage.py‎stage_advantage/annotation/gt_label.py renamed to stage_advantage/annotation/discretize_advantage.py
Lines changed: 28 additions & 39 deletions b/‎stage_advantage/annotation/gt_label.py‎ ‎…ntage/annotation/discretize_advantage.py‎stage_advantage/annotation/gt_label.py renamed to stage_advantage/annotation/discretize_advantage.py
Lines changed: 28 additions & 39 deletions
diff --git a/‎stage_advantage/annotation/gt_labeling.sh‎ ‎…ntage/annotation/discretize_advantage.sh‎stage_advantage/annotation/gt_labeling.sh renamed to stage_advantage/annotation/discretize_advantage.sh
Lines changed: 9 additions & 7 deletions b/‎stage_advantage/annotation/gt_labeling.sh‎ ‎…ntage/annotation/discretize_advantage.sh‎stage_advantage/annotation/gt_labeling.sh renamed to stage_advantage/annotation/discretize_advantage.sh
Lines changed: 9 additions & 7 deletions
diff --git a/‎stage_advantage/annotation/eval.sh‎
Lines changed: 0 additions & 70 deletions b/‎stage_advantage/annotation/eval.sh‎
Lines changed: 0 additions & 70 deletions
@@ -272,49 +272,45 @@ For gradient-based optimization, dataset splitting, and all other methods, see t
 
 Stage Advantage decomposes long-horizon tasks into semantic stages and provides stage-aware advantage signals for policy training. It addresses the numerical instability of prior non-stage approaches by computing advantage as progress differentials within each stage, yielding smoother and more stable supervision.
 
-The full pipeline has four stages:
+The full pipeline has five steps:
 
 ```
-Stage 0: GT Labeling  →  Stage 1: Train Advantage Estimator  →  Stage 2: Advantage Estimation  →  Stage 3: AWBC Training
+Step 0: Annotate stage_progress_gt (manual)  →  Step 1: Train Advantage Estimator  →  Step 2: Predict Advantage  →  Step 3: Discretize Advantage  →  Step 4: AWBC Training
 ```
 
 ### Quick Start
 
-**Stage 0 — GT Data Labeling**: Compute advantage values and discretize into `task_index` labels.
+**Step 0 — Annotate `stage_progress_gt`** (manual, no code provided): For each episode, annotate start/end timestamps and subtask split points, then compute per-frame `stage_progress_gt` (linear progress 0→1 within each subtask) and write it into the parquet files.
+
+**Step 1 — Train Advantage Estimator**: Fine-tune a pi0-based model to predict advantage from observations.
 
 ```bash
-cd stage_advantage/annotation
-python gt_label.py <dataset_path> \
-    --threshold 30 --chunk-size 50 --discretion-type binary \
-    --advantage-source absolute_advantage
+uv run python scripts/train_pytorch.py ADVANTAGE_TORCH_KAI0_FLATTEN_FOLD --exp_name=run1 --save_interval 10000
 ```
 
-For batch labeling across multiple dataset variants, see `stage_advantage/annotation/gt_labeling.sh`.
-
-**Stage 1 — Train Advantage Estimator**: Fine-tune a pi0-based model to predict advantage from observations.
+**Step 2 — Predict Advantage**: Use the trained estimator to label datasets with `absolute_advantage` and `relative_advantage`.
 
 ```bash
-uv run python scripts/train_pytorch.py ADVANTAGE_TORCH_KAI0_FLATTEN_FOLD --exp_name=run1 --save_interval 10000
+uv run python stage_advantage/annotation/eval.py Task-A KAI0 /path/to/dataset
 ```
 
-For a ready-to-use script with environment setup (conda/venv activation, DDP configuration) and automatic log management, see `stage_advantage/annotation/train_estimator.sh`.
-
-**Stage 2 — Advantage Estimation on New Data**: Use the trained estimator to label datasets with predicted advantage values.
+**Step 3 — Discretize Advantage**: Bin predicted advantages into positive/negative `task_index` labels.
 
 ```bash
-uv run python stage_advantage/annotation/eval.py Task-A KAI0 /path/to/dataset
+cd stage_advantage/annotation
+python discretize_advantage.py <dataset_path> \
+    --threshold 30 --chunk-size 50 --discretion-type binary \
+    --advantage-source absolute_advantage
 ```
 
-For a ready-to-use script with environment setup and status logging, see `stage_advantage/annotation/eval.sh`.
+For batch labeling across PI06/KAI0 variants, see `stage_advantage/annotation/discretize_advantage.sh`.
 
-**Stage 3 — AWBC Training**: Train a policy with Advantage-Weighted Behavior Cloning.
+**Step 4 — AWBC Training**: Train a policy with Advantage-Weighted Behavior Cloning.
 
 ```bash
 XLA_PYTHON_CLIENT_MEM_FRACTION=0.9 uv run scripts/train.py pi05_flatten_fold_awbc --exp_name=run1
 ```
 
-For a ready-to-use script with environment setup and automatic log management, see `stage_advantage/awbc/train_awbc.sh`.
-
 For the full pipeline details, configuration instructions, and all parameters, see [`stage_advantage/README.md`](stage_advantage/README.md).
 
 ## Train-Deploy Alignment
 
@@ -1,36 +1,31 @@
-## Annotation: Stage 0–2 (Labeling, Estimator Training, Eval)
+## Annotation: Steps 1–3 (Estimator Training, Eval, Discretize)
 
-This directory contains **Stage 0** (GT labeling with `gt_label.py` / `gt_labeling.sh`), **Stage 1** (advantage estimator training via `scripts/train_pytorch.py`), and **Stage 2** (advantage estimation on new data via `eval.py`). All commands below assume you are at the **repository root** unless noted. Full pipeline and options are in the [parent README](../README.md).
+This directory contains **Step 1** (advantage estimator training via `scripts/train_pytorch.py`), **Step 2** (advantage prediction on data via `eval.py`), and **Step 3** (discretize advantages into positive/negative via `discretize_advantage.py`). All commands below assume you are at the **repository root** unless noted. Full pipeline and options are in the [parent README](../README.md).
 
 ### Quick Start
 
 ```bash
-# Step 1: Label a dataset with advantage-based task_index (GT labels from progress)
-# Edit DATA_PATH in gt_labeling.sh, then from repo root:
-bash stage_advantage/annotation/gt_labeling.sh
-
-# Step 2: Train the Advantage Estimator (update config.py repo_id / pytorch_weight_path first)
-# From repo root:
+# Step 1: Train the Advantage Estimator (update config.py repo_id / pytorch_weight_path first)
 uv run python scripts/train_pytorch.py ADVANTAGE_TORCH_KAI0_FLATTEN_FOLD --exp_name=run1 --save_interval 10000
-# Or: uv run python scripts/train_pytorch.py ADVANTAGE_TORCH_PI06_FLATTEN_FOLD --exp_name=run1 --save_interval 10000
 
-# Step 3: Evaluate the trained estimator on new data (PI06 or KAI0)
-# From repo root:
+# Step 2: Predict advantages on a dataset (update MODELS_CONFIG_MAP in eval.py first)
 uv run python stage_advantage/annotation/eval.py Task-A KAI0 /path/to/dataset
 
-# Step 4: Use the advantage-labeled data for AWBC (Stage 3)
-# After Stage 2, run gt_labeling.sh with DATA_PATH = eval repo (or gt_label.py --advantage-source absolute_advantage).
-# Then from repo root:
+# Step 3: Discretize advantages into positive/negative task_index labels
+# Edit DATA_PATH in discretize_advantage.sh, then:
+bash stage_advantage/annotation/discretize_advantage.sh
+
+# Step 4: AWBC training (see awbc/README.md)
 XLA_PYTHON_CLIENT_MEM_FRACTION=0.9 uv run scripts/train.py pi05_flatten_fold_awbc --exp_name=run1
 ```
 
 ### File Descriptions
 
-| File | Stage | Description |
+| File | Step | Description |
 |---|---|---|
-| `gt_label.py` | 0 | Core script: computes advantage from progress/absolute_advantage and assigns `task_index` to parquet frames |
-| `gt_labeling.sh` | 0 | Batch labeling: prepares dataset dirs and runs `gt_label.py` (only .sh in this dir) |
-| `eval.py` | 2 | Evaluates a trained estimator on a dataset, writing predicted advantages to new parquets |
+| `discretize_advantage.py` | 3 | Reads advantage columns, bins into positive/negative `task_index`, writes `meta/tasks.jsonl` |
+| `discretize_advantage.sh` | 3 | Batch wrapper: prepares dataset dirs and runs `discretize_advantage.py` for PI06/KAI0 variants |
+| `eval.py` | 2 | Predicts advantage values on a dataset using a trained estimator |
 | `evaluator.py` | 2 | `SimpleValueEvaluator`: batched GPU inference with parallel video loading and prefetching |
 
-For Stage 0 parameters, Stage 1 config fields, Stage 2 `MODELS_CONFIG_MAP`, and end-to-end AWBC order, see the [parent README](../README.md).
+Step 1 training commands and Step 0 (manual annotation) are documented in the [parent README](../README.md).
@@ -1,24 +1,24 @@
 #!/usr/bin/env python3
 """
-# python label.py <dataset_path> --threshold 30 --chunk-size 50 --discretion-type binary --advantage-source absolute_advantage --stage-nums 2 --dry-run
-Script to modify task_index in parquet files based on progress rewards.
+# python discretize_advantage.py <dataset_path> --threshold 30 --chunk-size 50 --discretion-type binary --advantage-source absolute_advantage --stage-nums 2 --dry-run
+Script to modify task_index in parquet files based on predicted advantage values.
 
 This script:
 1. Reads all parquet files from path/data/chunk-*/*.parquet
-2. Calculates reward as: progress[i+50] - progress[i] for each frame
-3. Computes reward distribution statistics across all parquets
-4. Labels frames with task_index based on reward percentile threshold
+2. Reads per-frame advantage from the specified source column (absolute_advantage or relative_advantage)
+3. Computes advantage distribution statistics across all parquets
+4. Labels frames with task_index based on advantage percentile threshold
    Binary mode:
-   - task_index=0 for rewards in bottom (1-threshold)% 
-   - task_index=1 for rewards in top threshold%
+   - task_index=0 for advantages in bottom (1-threshold)%
+   - task_index=1 for advantages in top threshold%
    n_slices mode:
-   - task_index=0 to (n-1) based on reward percentiles (higher reward -> higher task_index)
+   - task_index=0 to (n-1) based on advantage percentiles (higher advantage -> higher task_index)
    - Each slice contains ~(100/n)% of frames
 
 Stage-based mode (--stage-nums > 1):
    - Each frame is assigned to a stage based on its stage_progress_gt value
    - Frames with stage_progress_gt in [i/stage_nums, (i+1)/stage_nums) belong to stage i
-   - Each stage has its own reward statistics and percentile boundaries
+   - Each stage has its own advantage statistics and percentile boundaries
    - task_index is assigned based on stage-specific percentiles
 """
 
@@ -35,38 +35,26 @@
 from tqdm import tqdm
 
 
-def calculate_rewards(data: pd.DataFrame, chunk_size: int = 50, advantage_source: str = "progress") -> np.ndarray:
+def calculate_rewards(data: pd.DataFrame, chunk_size: int = 50, advantage_source: str = "absolute_advantage") -> np.ndarray:
     """
-    Calculate rewards based on progress differences.
+    Read per-frame advantage values from the specified source column.
     
     Args:
-        data: DataFrame containing 'progress' column
-        chunk_size: Number of frames to look ahead for progress calculation
+        data: DataFrame containing the advantage column
+        chunk_size: Not used (kept for API compatibility)
+        advantage_source: Column name — "absolute_advantage" or "relative_advantage"
         
     Returns:
-        Array of rewards for each frame
+        Array of advantage values for each frame
     """
     n_frames = len(data)
-    rewards = np.zeros(n_frames, dtype=np.float32)
     if advantage_source == "absolute_advantage":
-        absolute_advantage = data['absolute_advantage'].values
-        for i in range(n_frames):
-            rewards[i] = absolute_advantage[i]
+        return data['absolute_advantage'].values.astype(np.float32)
     elif advantage_source == "relative_advantage":
-        relative_advantage = data['relative_advantage'].values
-        for i in range(n_frames):
-            rewards[i] = relative_advantage[i]
-    elif advantage_source == "progress":
-        progress = data['progress'].values
-        for i in range(n_frames):
-            if i + chunk_size < n_frames:
-                rewards[i] = progress[i + chunk_size] - progress[i]
-            else:
-                # For frames near the end, use the last available frame
-                rewards[i] = (progress[-1] - progress[i]) / (len(progress) - i) * chunk_size
+        return data['relative_advantage'].values.astype(np.float32)
     else:
-        raise ValueError(f"Unknown advantage source: {advantage_source}")
-    return rewards
+        raise ValueError(f"Unknown advantage source: {advantage_source}. "
+                         f"Must be 'absolute_advantage' or 'relative_advantage'.")
 
 
 def get_stage_index(stage_progress_gt: float, stage_nums: int) -> int:
@@ -91,7 +79,7 @@ def get_stage_index(stage_progress_gt: float, stage_nums: int) -> int:
     return stage_idx
 
 
-def collect_all_rewards(base_path: str, chunk_size: int = 50, advantage_source: str = "progress",
+def collect_all_rewards(base_path: str, chunk_size: int = 50, advantage_source: str = "absolute_advantage",
                         stage_nums: int = 1) -> Tuple[Dict[int, List[float]], List[str]]:
     """
     Collect all rewards from all parquet files to compute statistics.
@@ -223,9 +211,9 @@ def update_tasks_jsonl(base_path: str, discretion_type: str, n_slices: int = 10)
 def assign_task_index(parquet_file: str, threshold_percentile: float, 
                       chunk_size: int = 50, discretion_type: str = "binary",
                       percentile_boundaries: List[float] = None, n_slices: int = 10,
-                      advantage_source: str = "progress") -> None:
+                      advantage_source: str = "absolute_advantage") -> None:
     """
-    Assign task_index to frames in a parquet file based on reward threshold.
+    Assign task_index to frames in a parquet file based on advantage threshold.
     (Used when stage_nums=1)
     
     Args:
@@ -269,7 +257,7 @@ def assign_task_index_staged(parquet_file: str,
                              chunk_size: int = 50, 
                              discretion_type: str = "binary",
                              n_slices: int = 10,
-                             advantage_source: str = "progress",
+                             advantage_source: str = "absolute_advantage",
                              stage_nums: int = 1) -> None:
     """
     Assign task_index to frames in a parquet file based on stage-specific thresholds.
@@ -330,7 +318,7 @@ def assign_task_index_staged(parquet_file: str,
 
 def main():
     parser = argparse.ArgumentParser(
-        description="Modify task_index in parquet files based on progress rewards"
+        description="Discretize predicted advantage values into task_index labels"
     )
     parser.add_argument(
         "data_path",
@@ -365,8 +353,9 @@ def main():
     parser.add_argument(
         "--advantage-source",
         type=str,
-        default="progress",
-        choices=["progress", "absolute_advantage", "relative_advantage"]
+        default="absolute_advantage",
+        choices=["absolute_advantage", "relative_advantage"],
+        help="Which predicted advantage column to use (default: absolute_advantage)"
     )
     parser.add_argument(
         "--stage-nums",
@@ -396,7 +385,7 @@ def main():
         print(f"Threshold: {args.threshold}% (top {args.threshold}% will be task_index=1)")
     elif args.discretion_type == "n_slices":
         print(f"Number of slices: {args.n_slices}")
-    print(f"Progress offset: {args.chunk_size} frames")
+    print(f"Chunk size: {args.chunk_size} frames")
     print(f"Stage nums: {args.stage_nums}")
     if args.stage_nums > 1:
         step = 1.0 / args.stage_nums
 
@@ -1,6 +1,8 @@
 #!/bin/bash
 ###############################################################################
-# Prepare advantage-labeled datasets for training the Advantage Estimator.
+# Discretize predicted advantages into positive/negative task_index labels
+# for AWBC training. Run this AFTER Stage 2 (eval.py) has produced
+# data_PI06_*/data_KAI0_* subdirs with advantage columns.
 ###############################################################################
 set -xe
 set -o pipefail
@@ -18,7 +20,7 @@ dir_name=$(dirname "$DATA_PATH")/${base_name}_advantage_data
 prepare_and_label() {
     local data_subdir=$1      # source data subfolder name (e.g. data_PI06_100000 or data_KAI0_100000)
     local output_name=$2      # output dataset name suffix
-    local extra_args=$3       # extra arguments for gt_label.py
+    local extra_args=$3       # extra arguments for discretize_advantage.py
     local target_path="${dir_name}/${output_name}"
 
     echo "============================================================"
@@ -32,7 +34,7 @@ prepare_and_label() {
     # Symlink videos (shared, read-only)
     ln -sfn "${DATA_PATH}/videos" "${target_path}/videos"
 
-    # Copy norm_stats and meta (will be modified by gt_label.py)
+    # Copy norm_stats and meta (will be modified by discretize_advantage.py)
     cp -f "${DATA_PATH}/norm_stats.json" "${target_path}/norm_stats.json"
     cp -rf "${DATA_PATH}/meta" "${target_path}/meta"
 
@@ -42,8 +44,8 @@ prepare_and_label() {
     fi
     cp -r "${DATA_PATH}/${data_subdir}" "${target_path}/data"
 
-    # Run gt_label.py to assign task_index and update tasks.jsonl
-    python "${SCRIPT_DIR}/gt_label.py" "${target_path}" \
+    # Run discretize_advantage.py to assign task_index and update tasks.jsonl
+    python "${SCRIPT_DIR}/discretize_advantage.py" "${target_path}" \
         --threshold 30 \
         --chunk-size 50 \
         --discretion-type binary \
@@ -67,6 +69,6 @@ echo "  All datasets labeled successfully!"
 echo ""
 echo "  Output directory: ${dir_name}"
 echo ""
-echo "  Next step: set repo_id in config.py to the target dataset path,"
-echo "  then run: uv run python scripts/train_pytorch.py ADVANTAGE_TORCH_* --exp_name=run1 --save_interval 10000"
+echo "  Next step: set repo_id in AWBC config to the target dataset path,"
+echo "  then run: XLA_PYTHON_CLIENT_MEM_FRACTION=0.9 uv run scripts/train.py pi05_*_awbc --exp_name=run1"
 echo "============================================================"