InternRobotics
diff --git a/‎source/en/user_guide/internmanip/quick_start/add_benchmark.md‎
Lines changed: 11 additions & 5 deletions b/‎source/en/user_guide/internmanip/quick_start/add_benchmark.md‎
Lines changed: 11 additions & 5 deletions
diff --git a/‎source/en/user_guide/internmanip/quick_start/add_dataset.md‎
Lines changed: 1 addition & 1 deletion b/‎source/en/user_guide/internmanip/quick_start/add_dataset.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎source/en/user_guide/internmanip/quick_start/add_model.md‎
Lines changed: 58 additions & 24 deletions b/‎source/en/user_guide/internmanip/quick_start/add_model.md‎
Lines changed: 58 additions & 24 deletions
diff --git a/‎source/en/user_guide/internmanip/quick_start/installation.md‎
Lines changed: 1 addition & 1 deletion b/‎source/en/user_guide/internmanip/quick_start/installation.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎source/en/user_guide/internmanip/quick_start/train_eval.md‎
Lines changed: 22 additions & 11 deletions b/‎source/en/user_guide/internmanip/quick_start/train_eval.md‎
Lines changed: 22 additions & 11 deletions
@@ -1,12 +1,15 @@
 # 🥇 Add a New Benchmark
 
 
-This guide walks you through adding a custom agent and custom evaluation benchmark to the InternManip framework.
+This guide walks you through **adding a custom benchmark** into the InternManip framework, including defining your own `Agent` and `Evaluator` classes, as well as registering and launching them.
 
-### 1. Implement Your Model Agent
+### 1. Define a Custom Agent
 
 
-To support a new model in InternManip, define a subclass of [`BaseAgent`](../../internmanip/agent/base.py). You must implement two core methods:
+In the updated design, an **Agent** is tied to the **benchmark (evaluation environment)** rather than to a specific policy model. It is responsible for interfacing between the environment and the control policy, handling observation preprocessing and action postprocessing, and coordinating resets.
+
+
+All agents must inherit from [`BaseAgent`](../../internmanip/agent/base.py) and implement the following two methods:
 
 - `step()`: given an observation, returns an action.
 - `reset()`: resets internal states, if needed.
@@ -106,7 +109,10 @@ eval_cfg = EvalCfg(
     agent=AgentCfg(
         agent_type="custom_agent", # Corresponds to the name registered in AgentRegistry
         base_model_path="path/to/model",
-        model_kwargs={...},
+        agent_settings={...},
+        model_kwargs={
+            'HF_cache_dir': None,
+        },
         server_cfg=ServerCfg(  # Optional server configuration
             server_host="localhost",
             server_port=5000,
@@ -132,4 +138,4 @@ eval_cfg = EvalCfg(
 python scripts/eval/start_evaluator.py \
   --config scripts/eval/configs/custom_on_custom.py
 ```
-Use `--distributed` for Ray-based multi-GPU, and `--server` for client-server mode.
+> 💡 Use `--server` for client-server mode, and `--distributed` for Ray-based multi-GPU (WIP).
@@ -7,7 +7,7 @@ The process involves two main steps: **[ensuring the dataset format](#dataset-st
 
 ## Dataset Structure
 
-All datasets must follow the [LeRobotDataset Format](#https://github.com/huggingface/lerobot) to ensure compatibility with the data loaders and training pipelines.
+All datasets must follow the [LeRobotDataset Format](https://github.com/huggingface/lerobot) to ensure compatibility with the data loaders and training pipelines.
 The expected structure is:
 
 
 
@@ -41,8 +41,25 @@ Finally, you need to **register** the model with the framework and you can start
 
 ## 2. Create the Model Configuration File
 
-The config file is used to store the architecture related hyper-parameters. Here is some basic information you need to know:
-You shall add the model configuration file in `internmanip/configs/model/{model_name}_cfg.py`, which should inherit `transformers.PretrainedConfig`.
+Each model in our framework should define its architecture-related hyperparameters in a **configuration file**. 
+These configuration classes inherit from `transformers.PretrainedConfig`, which provides serialization, deserialization, and compatibility with HuggingFace’s model loading utilities.
+
+You should place your model’s config file in:
+```bash
+internmanip/configs/model/{model_name}_cfg.py
+```
+
+**🧱 About transformers.PretrainedConfig**
+
+[`PretrainedConfig`](https://huggingface.co/docs/transformers/main_classes/configuration) is the base class for all HuggingFace model configurations. It supports:
+- Loading/saving config files via .from_pretrained() and .save_pretrained()
+- Managing default values
+- Providing shared arguments across training, inference, and serialization
+
+
+<!-- The config file is used to store the architecture related hyper-parameters. Here is some basic information you need to know:
+You shall add the model configuration file in `internmanip/configs/model/{model_name}_cfg.py`, which should inherit `transformers.PretrainedConfig`. -->
+
 
 The following is **an example** of a model configuration file:
 
@@ -53,32 +70,41 @@ class CustomPolicyConfig(PretrainedConfig):
     """Configuration for CustomPolicy."""
     model_type = "custom_model"
 
-    def __init__(self,
-                 vit_name="google/vit-base-patch16-224-in21k",
-                 freeze_vit=True,
-                 hidden_dim=256,
-                 output_dim=8,
-                 dropout=0.0,
-                 n_obs_steps=1,
-                 horizon=10,
-                 **kwargs):
+    """Model-specific parameters"""
+    vit_name = "google/vit-base-patch16-224-in21k"
+    freeze_vit = True
+    hidden_dim = 256
+    output_dim = 8
+    dropout = 0.0
+    n_obs_steps = 1
+    horizon = 10
+
+    def __init__(self, **kwargs):
         super().__init__(**kwargs)
-        self.vit_name = vit_name
-        self.freeze_vit = freeze_vit
-        self.hidden_dim = hidden_dim
-        self.output_dim = output_dim
-        self.dropout = dropout
-        self.n_obs_steps = n_obs_steps
-        self.horizon = horizon
+        for key, value in kwargs.items():
+            setattr(self, key, value)
 
     def transform(self) -> Tuple[List[Transform], List[int], List[int]]:
+        """
+        This method defines the input processing logic for the model.
+        
+        It must return a 3-tuple:
+        - `transforms`: A list of preprocessing or augmentation operations applied to raw inputs.
+        - `obs_indices`: A list of time step indices used as observation input.
+        - `action_indices`: A list of time step indices the model needs to predict (action horizon).
+
+        You can customize `transforms` to include resizing, normalization, cropping, etc.
+        """
         transforms = None
         return transforms, list(range(self.n_obs_steps)), list(range(self.horizon))
 ```
+> 🔧 Important: All config classes must implement a transform() method that returns a 3-tuple.
 
 As shown in the example above, the config class defines key architectural hyperparameters—such as the backbone model name, whether to freeze the backbone, the hidden/output dimensions of the action head, and more. You are free to extend this config with any additional parameters required by your custom model.
 
-Additionally, you can implement a **model-specific `transform` method** within the config class. This method allows you to apply custom data transformations that are *not* included in the dataset-specific transform list defined in `internmanip/configs/dataset/data_config.py`.
+**🔧 About `transforms`**
+
+Additionally, you can implement a **model-specific `transform` method** within the config class. This method allows you to apply custom data transformations that are ***not*** included in the dataset-specific transform list defined in `internmanip/configs/dataset/data_config.py`.
 
 During training, the script `scripts/train/train.py` will automatically call this method and apply your custom transform alongside the default ones. Your `transform` method should follow the same input/output format as dataset-specific transform. For implementation guidance, refer to examples in the `internmanip/dataset/transform` directory.
 
@@ -98,14 +124,21 @@ import torch.nn as nn, torch.nn.functional as F, torch
 from typing import Dict
 from internmanip.configs.model.custom_policy_cfg import CustomPolicyConfig
 
-@BasePolicyModel.register("custom_model")
 class CustomPolicyModel(BasePolicyModel):
     """ViT backbone + 2‑layer MLP head."""
 
-    def __init__(self, config: CustomPolicyConfig):
+    config_class = CustomPolicyConfig
+    name = "custom_model"
+
+    def __init__(
+        self, 
+        config: Optional[CustomPolicyConfig] = None,
+        *args,
+        **kwargs
+    ):
         super().__init__(config)
         self.config = config
-        name = "custom_model"
+        
 
         # 1 Backbone
         vit_conf = ViTConfig.from_pretrained(config.vit_name)
@@ -150,8 +183,9 @@ You need to define a data_collator function that converts a list of raw samples
 import torch
 import torch.nn as nn
 import torch.nn.functional as F
+from internmanip.configs.model.custom_cfg import CustomPolicyConfig
 
-@DataCollatorRegistry.register("custom_model")
+@DataCollatorRegistry.register(CustomPolicyConfig.model_type)
 def custom_data_collator(samples):
     imgs = torch.stack([s["image"] for s in samples])
     acts = torch.stack([s["action"] for s in samples])
@@ -174,7 +208,7 @@ AutoModel.register(CustomPolicyConfig, CustomPolicyModel)
 
 Make sure the string `"custom_model"` passed to `AutoConfig.register` matches the model name used in both your `CustomPolicyModel` definition and the data collator registration.
 
-Don't forget to register the module in your __init__.py, so that your custom model gets imported and initialized properly during runtime. For example:
+Don't forget to register the module in your `__init__.py`, so that your custom model gets imported and initialized properly during runtime. For example:
 
 ```python
 # In internmanip/model/basemodel/__init__.py
 
@@ -391,7 +391,7 @@ This will start the policy server that listens for observation inputs and sends
 Activate the environment for Simpler-Env, and run the evaluator:
 ```bash
 source .venv/simpler-env/bin/activate
-python scripts/eval/start_evaluator.py --config run_configs/examples/internmanip_demo.py
+python scripts/eval/start_evaluator.py --config run_configs/examples/internmanip_demo.py --server
 ```
 
 This will run the evaluation loop that sends observations to the model server and executes returned actions in the environment.
 
@@ -33,13 +33,13 @@ export PYTHONPATH="$(pwd):$PYTHONPATH"
 
 
 We provide several built-in policies such as **GR00T-N1**, **GR00T-N1.5**, **Pi-0**, **DP-CLIP**, and **ACT-CLIP**.
-To quickly verify your setup, you can train the **DP-CLIP** model on the `genmanip-demo` dataset (300 demonstrations of the instruction *"Move the milk carton to the top of the ceramic bowl"*).
+To quickly verify your setup, you can train the **Pi-0** model on the `genmanip-demo` dataset (300 demonstrations of the instruction *"Move the milk carton to the top of the ceramic bowl"*).
 This requires **1 GPU with at least 24GB memory**:
 
 ```bash
 torchrun --nnodes 1 --nproc_per_node 1 \       # number of processes per node, e.g., 1
    scripts/train/train.py \
-   --config run_configs/train/dp_clip_genmanip_v1.yaml # Config file that specifies which model to train on which dataset, along with hyperparameters
+   --config run_configs/train/pi0_genmanip_v1.yaml # Config file that specifies which model to train on which dataset, along with hyperparameters
 ```
 
 > 😄 When you run the script, it will prompt you to log in to Weights & Biases (WandB). This integration allows you to monitor your training process in real time via the WandB dashboard.
@@ -131,6 +131,7 @@ srun bash train_pi0_genmanip_slurm.sh
 ```bash
 sbatch multinode_submit.slurm
 ```
+> 💡 Tips: The recommended training setup is a global batch size of 2048 for 50,000 steps, which typically takes approximately 500 GPU hours (Assuming each node has 8 GPUs).
 
 ## Customizing Training with Your Own YAML Config
 
@@ -148,11 +149,8 @@ base_model_path: lerobot/pi0          # (Optional) Overrides the model checkpoin
 **💡 Notes:**
 
 - `model_type`: Must match the name of a model that has already been registered within InternManip.
-
 - `dataset_path`: Can be a HuggingFace ID (e.g., `InternRobotics/InternData-GenmanipTest`) or a local directory where the dataset is downloaded.
-
 - `data_config`: Refers to a dataset configuration preset (e.g., for preprocessing or loading behavior), also pre-registered in the codebase.
-
 - `base_model_path`: This is optional. If the selected `model_type` is supported and known, InternManip will automatically resolve and download the correct checkpoint from HuggingFace. If you’ve already downloaded a model locally or want to use a custom one, you can specify the path here directly.
 
 By editing or extending this YAML file, you can quickly try different models, datasets, or training setups — all without modifying the training script.
@@ -165,7 +163,6 @@ By editing or extending this YAML file, you can quickly try different models, da
 When creating your own YAML config file for training or evaluation, you can directly refer to the following officially supported values:
 
 - Use values from the `${model_type}` and `${base_model_path}` columns below to populate the corresponding fields in your YAML.
-
 - Similarly, values from the `${data_config}` and `${dataset_path}` columns can be used to specify the dataset configuration and loading path.
 
 <!-- This ensures consistency with the models and datasets that have been pre-registered within InternManip. -->
@@ -274,16 +271,30 @@ python scripts/eval/start_evaluator.py \
    --config scripts/eval/config/pi0_on_genmanip.py
 ``` -->
 
-## Evaluation and Benchmarking (WIP)
+## Evaluation and Benchmarking
+
+
+The default evaluation setup adopts a client-server architecture where the policy (model) and the environment run in separate processes. This improves compatibility and modularity for large-scale benchmarks.
+You can evaluate `pi0` on the `genmanip` benchmark in a single process using the following command:
 
 
-By default, the inference of model will be running in the main loop sharing the same process with the `env`. You can evaluate `pi0` on the `Genmanip` benchmark in a single process using the following command:
+**🖥 Terminal 1: Launch the Policy Server (Model Side)**
 
+Activate the environment for the model and start the policy server:
 ```bash
-python scripts/eval/start_evaluator.py \
-   --config scripts/eval/config/pi0_on_genmanip.py
+source .venv/model/bin/activate
+python scripts/eval/start_policy_server.py
+```
+This server listens for observation inputs from the environment and responds with action predictions from the model.
+
+**🖥 Terminal 2: Launch the Evaluator (Environment Side)**
+```bash
+source .venv/simpler_env/bin/activate
+python scripts/eval/start_evaluator.py --config run_configs/eval/pi0_on_genmanip.py --server
 ```
 
+This client sends observations to the model server, receives actions, and executes them in the environment.
+
 
 <!-- ### 1. Client-Server Setup
 
@@ -327,7 +338,7 @@ The terminal prints SR (Success Rate) information for each episode and task:
 - **Intermediate results**: RGB images (if `is_save_img=True`), robot state information.
 - **Result summary**: A `result.json` file containing task-level and episode-level success rates (same as terminal output). -->
 
-You can view the images generated during evaluation in the `eval_results` directory.
+You can view the images generated during evaluation in the `logs/demo/gr00t_n1_on_simpler` directory.