Skip to content

Commit 0adfee8

Browse files
author
zhouyunsong
committed
!34 Improve the documentation
* 更新文件: evaluation.md * 更新文件: model.md * 更新文件: add_benchmark.md * 更新文件: train_eval.md * 更新文件: installation.md * 更新文件: add_dataset.md * 更新文件: model.md * update * update train_val and training * update add model * 更新文件: train_eval.md
1 parent 11166cb commit 0adfee8

File tree

8 files changed

+384
-244
lines changed

8 files changed

+384
-244
lines changed

source/en/user_guide/internmanip/quick_start/add_benchmark.md

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,15 @@
11
# 🥇 Add a New Benchmark
22

33

4-
This guide walks you through adding a custom agent and custom evaluation benchmark to the InternManip framework.
4+
This guide walks you through **adding a custom benchmark** into the InternManip framework, including defining your own `Agent` and `Evaluator` classes, as well as registering and launching them.
55

6-
### 1. Implement Your Model Agent
6+
### 1. Define a Custom Agent
77

88

9-
To support a new model in InternManip, define a subclass of [`BaseAgent`](../../internmanip/agent/base.py). You must implement two core methods:
9+
In the updated design, an **Agent** is tied to the **benchmark (evaluation environment)** rather than to a specific policy model. It is responsible for interfacing between the environment and the control policy, handling observation preprocessing and action postprocessing, and coordinating resets.
10+
11+
12+
All agents must inherit from [`BaseAgent`](../../internmanip/agent/base.py) and implement the following two methods:
1013

1114
- `step()`: given an observation, returns an action.
1215
- `reset()`: resets internal states, if needed.
@@ -106,7 +109,10 @@ eval_cfg = EvalCfg(
106109
agent=AgentCfg(
107110
agent_type="custom_agent", # Corresponds to the name registered in AgentRegistry
108111
base_model_path="path/to/model",
109-
model_kwargs={...},
112+
agent_settings={...},
113+
model_kwargs={
114+
'HF_cache_dir': None,
115+
},
110116
server_cfg=ServerCfg( # Optional server configuration
111117
server_host="localhost",
112118
server_port=5000,
@@ -132,4 +138,4 @@ eval_cfg = EvalCfg(
132138
python scripts/eval/start_evaluator.py \
133139
--config scripts/eval/configs/custom_on_custom.py
134140
```
135-
Use `--distributed` for Ray-based multi-GPU, and `--server` for client-server mode.
141+
> 💡 Use `--server` for client-server mode, and `--distributed` for Ray-based multi-GPU (WIP).

source/en/user_guide/internmanip/quick_start/add_dataset.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ The process involves two main steps: **[ensuring the dataset format](#dataset-st
77

88
## Dataset Structure
99

10-
All datasets must follow the [LeRobotDataset Format](#https://github.com/huggingface/lerobot) to ensure compatibility with the data loaders and training pipelines.
10+
All datasets must follow the [LeRobotDataset Format](https://github.com/huggingface/lerobot) to ensure compatibility with the data loaders and training pipelines.
1111
The expected structure is:
1212

1313

source/en/user_guide/internmanip/quick_start/add_model.md

Lines changed: 58 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -41,8 +41,25 @@ Finally, you need to **register** the model with the framework and you can start
4141

4242
## 2. Create the Model Configuration File
4343

44-
The config file is used to store the architecture related hyper-parameters. Here is some basic information you need to know:
45-
You shall add the model configuration file in `internmanip/configs/model/{model_name}_cfg.py`, which should inherit `transformers.PretrainedConfig`.
44+
Each model in our framework should define its architecture-related hyperparameters in a **configuration file**.
45+
These configuration classes inherit from `transformers.PretrainedConfig`, which provides serialization, deserialization, and compatibility with HuggingFace’s model loading utilities.
46+
47+
You should place your model’s config file in:
48+
```bash
49+
internmanip/configs/model/{model_name}_cfg.py
50+
```
51+
52+
**🧱 About transformers.PretrainedConfig**
53+
54+
[`PretrainedConfig`](https://huggingface.co/docs/transformers/main_classes/configuration) is the base class for all HuggingFace model configurations. It supports:
55+
- Loading/saving config files via .from_pretrained() and .save_pretrained()
56+
- Managing default values
57+
- Providing shared arguments across training, inference, and serialization
58+
59+
60+
<!-- The config file is used to store the architecture related hyper-parameters. Here is some basic information you need to know:
61+
You shall add the model configuration file in `internmanip/configs/model/{model_name}_cfg.py`, which should inherit `transformers.PretrainedConfig`. -->
62+
4663

4764
The following is **an example** of a model configuration file:
4865

@@ -53,32 +70,41 @@ class CustomPolicyConfig(PretrainedConfig):
5370
"""Configuration for CustomPolicy."""
5471
model_type = "custom_model"
5572

56-
def __init__(self,
57-
vit_name="google/vit-base-patch16-224-in21k",
58-
freeze_vit=True,
59-
hidden_dim=256,
60-
output_dim=8,
61-
dropout=0.0,
62-
n_obs_steps=1,
63-
horizon=10,
64-
**kwargs):
73+
"""Model-specific parameters"""
74+
vit_name = "google/vit-base-patch16-224-in21k"
75+
freeze_vit = True
76+
hidden_dim = 256
77+
output_dim = 8
78+
dropout = 0.0
79+
n_obs_steps = 1
80+
horizon = 10
81+
82+
def __init__(self, **kwargs):
6583
super().__init__(**kwargs)
66-
self.vit_name = vit_name
67-
self.freeze_vit = freeze_vit
68-
self.hidden_dim = hidden_dim
69-
self.output_dim = output_dim
70-
self.dropout = dropout
71-
self.n_obs_steps = n_obs_steps
72-
self.horizon = horizon
84+
for key, value in kwargs.items():
85+
setattr(self, key, value)
7386

7487
def transform(self) -> Tuple[List[Transform], List[int], List[int]]:
88+
"""
89+
This method defines the input processing logic for the model.
90+
91+
It must return a 3-tuple:
92+
- `transforms`: A list of preprocessing or augmentation operations applied to raw inputs.
93+
- `obs_indices`: A list of time step indices used as observation input.
94+
- `action_indices`: A list of time step indices the model needs to predict (action horizon).
95+
96+
You can customize `transforms` to include resizing, normalization, cropping, etc.
97+
"""
7598
transforms = None
7699
return transforms, list(range(self.n_obs_steps)), list(range(self.horizon))
77100
```
101+
> 🔧 Important: All config classes must implement a transform() method that returns a 3-tuple.
78102
79103
As shown in the example above, the config class defines key architectural hyperparameters—such as the backbone model name, whether to freeze the backbone, the hidden/output dimensions of the action head, and more. You are free to extend this config with any additional parameters required by your custom model.
80104

81-
Additionally, you can implement a **model-specific `transform` method** within the config class. This method allows you to apply custom data transformations that are *not* included in the dataset-specific transform list defined in `internmanip/configs/dataset/data_config.py`.
105+
**🔧 About `transforms`**
106+
107+
Additionally, you can implement a **model-specific `transform` method** within the config class. This method allows you to apply custom data transformations that are ***not*** included in the dataset-specific transform list defined in `internmanip/configs/dataset/data_config.py`.
82108

83109
During training, the script `scripts/train/train.py` will automatically call this method and apply your custom transform alongside the default ones. Your `transform` method should follow the same input/output format as dataset-specific transform. For implementation guidance, refer to examples in the `internmanip/dataset/transform` directory.
84110

@@ -98,14 +124,21 @@ import torch.nn as nn, torch.nn.functional as F, torch
98124
from typing import Dict
99125
from internmanip.configs.model.custom_policy_cfg import CustomPolicyConfig
100126

101-
@BasePolicyModel.register("custom_model")
102127
class CustomPolicyModel(BasePolicyModel):
103128
"""ViT backbone + 2‑layer MLP head."""
104129

105-
def __init__(self, config: CustomPolicyConfig):
130+
config_class = CustomPolicyConfig
131+
name = "custom_model"
132+
133+
def __init__(
134+
self,
135+
config: Optional[CustomPolicyConfig] = None,
136+
*args,
137+
**kwargs
138+
):
106139
super().__init__(config)
107140
self.config = config
108-
name = "custom_model"
141+
109142

110143
# 1 Backbone
111144
vit_conf = ViTConfig.from_pretrained(config.vit_name)
@@ -150,8 +183,9 @@ You need to define a data_collator function that converts a list of raw samples
150183
import torch
151184
import torch.nn as nn
152185
import torch.nn.functional as F
186+
from internmanip.configs.model.custom_cfg import CustomPolicyConfig
153187

154-
@DataCollatorRegistry.register("custom_model")
188+
@DataCollatorRegistry.register(CustomPolicyConfig.model_type)
155189
def custom_data_collator(samples):
156190
imgs = torch.stack([s["image"] for s in samples])
157191
acts = torch.stack([s["action"] for s in samples])
@@ -174,7 +208,7 @@ AutoModel.register(CustomPolicyConfig, CustomPolicyModel)
174208

175209
Make sure the string `"custom_model"` passed to `AutoConfig.register` matches the model name used in both your `CustomPolicyModel` definition and the data collator registration.
176210

177-
Don't forget to register the module in your __init__.py, so that your custom model gets imported and initialized properly during runtime. For example:
211+
Don't forget to register the module in your `__init__.py`, so that your custom model gets imported and initialized properly during runtime. For example:
178212

179213
```python
180214
# In internmanip/model/basemodel/__init__.py

source/en/user_guide/internmanip/quick_start/installation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -391,7 +391,7 @@ This will start the policy server that listens for observation inputs and sends
391391
Activate the environment for Simpler-Env, and run the evaluator:
392392
```bash
393393
source .venv/simpler-env/bin/activate
394-
python scripts/eval/start_evaluator.py --config run_configs/examples/internmanip_demo.py
394+
python scripts/eval/start_evaluator.py --config run_configs/examples/internmanip_demo.py --server
395395
```
396396
397397
This will run the evaluation loop that sends observations to the model server and executes returned actions in the environment.

source/en/user_guide/internmanip/quick_start/train_eval.md

Lines changed: 22 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -33,13 +33,13 @@ export PYTHONPATH="$(pwd):$PYTHONPATH"
3333

3434

3535
We provide several built-in policies such as **GR00T-N1**, **GR00T-N1.5**, **Pi-0**, **DP-CLIP**, and **ACT-CLIP**.
36-
To quickly verify your setup, you can train the **DP-CLIP** model on the `genmanip-demo` dataset (300 demonstrations of the instruction *"Move the milk carton to the top of the ceramic bowl"*).
36+
To quickly verify your setup, you can train the **Pi-0** model on the `genmanip-demo` dataset (300 demonstrations of the instruction *"Move the milk carton to the top of the ceramic bowl"*).
3737
This requires **1 GPU with at least 24GB memory**:
3838

3939
```bash
4040
torchrun --nnodes 1 --nproc_per_node 1 \ # number of processes per node, e.g., 1
4141
scripts/train/train.py \
42-
--config run_configs/train/dp_clip_genmanip_v1.yaml # Config file that specifies which model to train on which dataset, along with hyperparameters
42+
--config run_configs/train/pi0_genmanip_v1.yaml # Config file that specifies which model to train on which dataset, along with hyperparameters
4343
```
4444

4545
> 😄 When you run the script, it will prompt you to log in to Weights & Biases (WandB). This integration allows you to monitor your training process in real time via the WandB dashboard.
@@ -131,6 +131,7 @@ srun bash train_pi0_genmanip_slurm.sh
131131
```bash
132132
sbatch multinode_submit.slurm
133133
```
134+
> 💡 Tips: The recommended training setup is a global batch size of 2048 for 50,000 steps, which typically takes approximately 500 GPU hours (Assuming each node has 8 GPUs).
134135
135136
## Customizing Training with Your Own YAML Config
136137

@@ -148,11 +149,8 @@ base_model_path: lerobot/pi0 # (Optional) Overrides the model checkpoin
148149
**💡 Notes:**
149150
150151
- `model_type`: Must match the name of a model that has already been registered within InternManip.
151-
152152
- `dataset_path`: Can be a HuggingFace ID (e.g., `InternRobotics/InternData-GenmanipTest`) or a local directory where the dataset is downloaded.
153-
154153
- `data_config`: Refers to a dataset configuration preset (e.g., for preprocessing or loading behavior), also pre-registered in the codebase.
155-
156154
- `base_model_path`: This is optional. If the selected `model_type` is supported and known, InternManip will automatically resolve and download the correct checkpoint from HuggingFace. If you’ve already downloaded a model locally or want to use a custom one, you can specify the path here directly.
157155

158156
By editing or extending this YAML file, you can quickly try different models, datasets, or training setups — all without modifying the training script.
@@ -165,7 +163,6 @@ By editing or extending this YAML file, you can quickly try different models, da
165163
When creating your own YAML config file for training or evaluation, you can directly refer to the following officially supported values:
166164

167165
- Use values from the `${model_type}` and `${base_model_path}` columns below to populate the corresponding fields in your YAML.
168-
169166
- Similarly, values from the `${data_config}` and `${dataset_path}` columns can be used to specify the dataset configuration and loading path.
170167

171168
<!-- This ensures consistency with the models and datasets that have been pre-registered within InternManip. -->
@@ -274,16 +271,30 @@ python scripts/eval/start_evaluator.py \
274271
--config scripts/eval/config/pi0_on_genmanip.py
275272
``` -->
276273
277-
## Evaluation and Benchmarking (WIP)
274+
## Evaluation and Benchmarking
275+
276+
277+
The default evaluation setup adopts a client-server architecture where the policy (model) and the environment run in separate processes. This improves compatibility and modularity for large-scale benchmarks.
278+
You can evaluate `pi0` on the `genmanip` benchmark in a single process using the following command:
278279

279280

280-
By default, the inference of model will be running in the main loop sharing the same process with the `env`. You can evaluate `pi0` on the `Genmanip` benchmark in a single process using the following command:
281+
**🖥 Terminal 1: Launch the Policy Server (Model Side)**
281282

283+
Activate the environment for the model and start the policy server:
282284
```bash
283-
python scripts/eval/start_evaluator.py \
284-
--config scripts/eval/config/pi0_on_genmanip.py
285+
source .venv/model/bin/activate
286+
python scripts/eval/start_policy_server.py
287+
```
288+
This server listens for observation inputs from the environment and responds with action predictions from the model.
289+
290+
**🖥 Terminal 2: Launch the Evaluator (Environment Side)**
291+
```bash
292+
source .venv/simpler_env/bin/activate
293+
python scripts/eval/start_evaluator.py --config run_configs/eval/pi0_on_genmanip.py --server
285294
```
286295

296+
This client sends observations to the model server, receives actions, and executes them in the environment.
297+
287298

288299
<!-- ### 1. Client-Server Setup
289300

@@ -327,7 +338,7 @@ The terminal prints SR (Success Rate) information for each episode and task:
327338
- **Intermediate results**: RGB images (if `is_save_img=True`), robot state information.
328339
- **Result summary**: A `result.json` file containing task-level and episode-level success rates (same as terminal output). -->
329340

330-
You can view the images generated during evaluation in the `eval_results` directory.
341+
You can view the images generated during evaluation in the `logs/demo/gr00t_n1_on_simpler` directory.
331342

332343

333344

0 commit comments

Comments
 (0)