Skip to content

Commit de0280a

Browse files
zrz-shXuS1994andylin-hao
authored
feat(embodiment): add robocasa environment for openpi models (RLinf#437)
Signed-off-by: zhangruize <2290321870@qq.com> Signed-off-by: Hao Lin <linhaomails@gmail.com> Co-authored-by: xusi <xusiforwork@gmail.com> Co-authored-by: Hao Lin <linhaomails@gmail.com>
1 parent d7205ce commit de0280a

File tree

24 files changed

+2214
-8
lines changed

24 files changed

+2214
-8
lines changed

.github/workflows/docker-build.yml

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -298,6 +298,65 @@ jobs:
298298
NO_MIRROR=true
299299
outputs: type=cacheonly
300300
tags: rlinf:embodied-calvin
301+
302+
build-embodied-robocasa:
303+
runs-on: ubuntu-latest
304+
steps:
305+
- name: Maximize storage space
306+
run: |
307+
# Remove Java (JDKs)
308+
sudo rm -rf /usr/lib/jvm
309+
310+
# Remove .NET SDKs
311+
sudo rm -rf /usr/share/dotnet
312+
313+
# Remove Swift toolchain
314+
sudo rm -rf /usr/share/swift
315+
316+
# Remove Haskell (GHC)
317+
sudo rm -rf /usr/local/.ghcup
318+
319+
# Remove Julia
320+
sudo rm -rf /usr/local/julia*
321+
322+
# Remove Android SDKs
323+
sudo rm -rf /usr/local/lib/android
324+
325+
# Remove Chromium (optional if not using for browser tests)
326+
sudo rm -rf /usr/local/share/chromium
327+
328+
# Remove Microsoft/Edge and Google Chrome builds
329+
sudo rm -rf /opt/microsoft /opt/google
330+
331+
# Remove Azure CLI
332+
sudo rm -rf /opt/az
333+
334+
# Remove PowerShell
335+
sudo rm -rf /usr/local/share/powershell
336+
337+
# Remove CodeQL and other toolcaches
338+
sudo rm -rf /opt/hostedtoolcache
339+
340+
docker system prune -af || true
341+
docker builder prune -af || true
342+
df -h
343+
344+
- name: Checkout code
345+
uses: actions/checkout@v5
346+
347+
- name: Set up Docker Buildx
348+
uses: docker/setup-buildx-action@v3
349+
350+
- name: Build embodied-behavior
351+
uses: docker/build-push-action@v6
352+
with:
353+
file: ./docker/Dockerfile
354+
push: false
355+
build-args: |
356+
BUILD_TARGET=embodied-robocasa
357+
NO_MIRROR=true
358+
outputs: type=cacheonly
359+
tags: rlinf:embodied-robocasa
301360

302361
build-embodied-isaaclab:
303362
runs-on: ubuntu-latest

.github/workflows/embodied-e2e-tests.yml

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -261,6 +261,34 @@ jobs:
261261
source .venv/bin/activate
262262
bash tests/e2e_tests/embodied/run.sh calvin_ppo_openpi
263263
264+
- name: Clean up
265+
run: |
266+
rm -rf .venv
267+
uv cache prune
268+
269+
embodied-openpi-robocasa-test:
270+
runs-on: embodied
271+
steps:
272+
- name: Checkout code
273+
uses: actions/checkout@v5
274+
275+
- name: Create embodied environment
276+
run: |
277+
unset UV_DEFAULT_INDEX
278+
export UV_PATH=/workspace/dataset/.uv
279+
export UV_LINK_MODE=symlink
280+
export UV_CACHE_DIR=/workspace/dataset/.uv_cache
281+
export UV_PYTHON_INSTALL_DIR=/workspace/dataset/.uv_python
282+
export ROBOCASA_PATH=/workspace/dataset/robocasa
283+
bash requirements/install.sh embodied --model openpi --env robocasa
284+
285+
- name: Robocasa GRPO test
286+
timeout-minutes: 20
287+
run: |
288+
export REPO_PATH=$(pwd)
289+
source .venv/bin/activate
290+
bash tests/e2e_tests/embodied/run.sh robocasa_grpo_openpi
291+
264292
- name: Clean up
265293
run: |
266294
rm -rf .venv

README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ RLinf is a flexible and scalable open-source infrastructure designed for post-tr
3030

3131

3232
## What's NEW!
33+
- [2025/12] 🔥 RLinf supports reinforcement learning fine-tuning for [RoboCasa](https://github.com/robocasa/robocasa). Doc: [RL on Robocasa](https://rlinf.readthedocs.io/en/latest/rst_source/examples/robocasa.html).
3334
- [2025/12] 🎉 RLinf official release of [v0.1](https://github.com/RLinf/RLinf/releases/tag/v0.1).
3435
- [2025/11] 🔥 RLinf supports reinforcement learning fine-tuning for [CALVIN](https://github.com/mees/calvin). Doc: [RL on CALVIN](https://rlinf.readthedocs.io/en/latest/rst_source/examples/calvin.html).
3536
- [2025/11] 🔥 RLinf supports reinforcement learning fine-tuning for [IsaacLab](https://github.com/isaac-sim/IsaacLab). Doc: [RL on IsaacLab](https://rlinf.readthedocs.io/en/latest/rst_source/examples/isaaclab.html).
@@ -71,7 +72,7 @@ RLinf is a flexible and scalable open-source infrastructure designed for post-tr
7172
<li><a href="https://rlinf.readthedocs.io/en/latest/rst_source/examples/metaworld.html">MetaWorld</a> ✅</li>
7273
<li><a href="https://rlinf.readthedocs.io/en/latest/rst_source/examples/isaaclab.html">IsaacLab</a> ✅</li>
7374
<li><a href="https://rlinf.readthedocs.io/en/latest/rst_source/examples/calvin.html">CALVIN</a> ✅</li>
74-
<li>RoboCasa</li>
75+
<li><a href="https://rlinf.readthedocs.io/en/latest/rst_source/examples/robocasa.html">RoboCasa</a> ✅</li>
7576
<li>More...</li>
7677
</ul>
7778
</td>
@@ -562,7 +563,7 @@ and exhibits greater stability.
562563
- [X] Support for Vision-Language Models (VLMs) training
563564
- [ ] Support for deep searcher agent training
564565
- [ ] Support for multi-agent training
565-
- [ ] Support for integration with more embodied simulators (e.g., [RoboCasa](https://github.com/robocasa/robocasa), [GENESIS](https://github.com/Genesis-Embodied-AI/Genesis), [RoboTwin](https://github.com/RoboTwin-Platform/RoboTwin))
566+
- [ ] Support for integration with more embodied simulators (e.g., [GENESIS](https://github.com/Genesis-Embodied-AI/Genesis), [RoboTwin](https://github.com/RoboTwin-Platform/RoboTwin))
566567
- [ ] Support for more Vision Language Action models (VLAs) (e.g., [WALL-OSS](https://huggingface.co/x-square-robot/wall-oss-flow))
567568
- [ ] Support for world model
568569
- [ ] Support for real-world RL embodied intelligence

README.zh-CN.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ RLinf 是一个灵活且可扩展的开源框架,专为利用强化学习进
3030

3131

3232
## 最新动态
33+
- [2025/12] 🔥 基于[RoboCasa](https://github.com/robocasa/robocasa)的强化学习微调已经上线! 文档:[RL on RoboCasa](https://rlinf.readthedocs.io/zh-cn/latest/rst_source/examples/robocasa.html)
3334
- [2025/12] 🎉 RLinf正式发布[v0.1](https://github.com/RLinf/RLinf/releases/tag/v0.1)版本。
3435
- [2025/11] 🔥 基于[CALVIN](https://github.com/mees/calvin)的强化学习微调已经上线! 文档:[RL on CALVIN](https://rlinf.readthedocs.io/zh-cn/latest/rst_source/examples/calvin.html)
3536
- [2025/11] 🔥 基于[IsaacLab](https://github.com/isaac-sim/IsaacLab)的强化学习微调已经上线! 文档:[RL on IsaacLab](https://rlinf.readthedocs.io/zh-cn/latest/rst_source/examples/isaaclab.html)
@@ -70,8 +71,7 @@ RLinf 是一个灵活且可扩展的开源框架,专为利用强化学习进
7071
<li><a href="https://rlinf.readthedocs.io/zh-cn/latest/rst_source/examples/behavior.html">BEHAVIOR</a> ✅</li>
7172
<li><a href="https://rlinf.readthedocs.io/zh-cn/latest/rst_source/examples/metaworld.html">MetaWorld</a> ✅</li>
7273
<li><a href="https://rlinf.readthedocs.io/zh-cn/latest/rst_source/examples/isaaclab.html">IsaacLab</a> ✅</li>
73-
<li><a href="https://rlinf.readthedocs.io/zh-cn/latest/rst_source/examples/calvin.html">CALVIN</a> ✅</li>
74-
<li>RoboCasa</li>
74+
<li><a href="https://rlinf.readthedocs.io/zh-cn/latest/rst_source/examples/robocasa.html">RoboCasa</a> ✅</li>
7575
<li>More...</li>
7676
</ul>
7777
</td>
@@ -89,7 +89,7 @@ RLinf 是一个灵活且可扩展的开源框架,专为利用强化学习进
8989
<li><a href="https://rlinf.readthedocs.io/zh-cn/latest/rst_source/examples/pi0.html">π₀.₅</a> ✅</li>
9090
<li><a href="https://rlinf.readthedocs.io/zh-cn/latest/rst_source/examples/maniskill.html">OpenVLA</a> ✅</li>
9191
<li><a href="https://rlinf.readthedocs.io/zh-cn/latest/rst_source/examples/libero.html">OpenVLA-OFT</a> ✅</li>
92-
<li><a href="https://rlinf.readthedocs.io/en/latest/rst_source/examples/gr00t.html">GR00T</a> ✅</li>
92+
<li><a href="https://rlinf.readthedocs.io/zh-cn/latest/rst_source/examples/gr00t.html">GR00T</a> ✅</li>
9393
</ul>
9494
<li><b>VLM 模型</b></li>
9595
<ul>
@@ -565,7 +565,7 @@ RLinf 是一个灵活且可扩展的开源框架,专为利用强化学习进
565565
- [ ] 支持深度搜索智能体训练
566566

567567
- [ ] 支持多智能体训练
568-
- [ ] 支持更多具身模拟器的集成 (如 [RoboCasa](https://github.com/robocasa/robocasa), [GENESIS](https://github.com/Genesis-Embodied-AI/Genesis), [RoboTwin](https://github.com/RoboTwin-Platform/RoboTwin))
568+
- [ ] 支持更多具身模拟器的集成 (如 [GENESIS](https://github.com/Genesis-Embodied-AI/Genesis), [RoboTwin](https://github.com/RoboTwin-Platform/RoboTwin))
569569
- [ ] 支持更多VLA模型 (如[WALL-OSS](https://huggingface.co/x-square-robot/wall-oss-flow))
570570
- [ ] 支持世界模型(World Model)
571571

docker/Dockerfile

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -139,6 +139,17 @@ RUN link_assets
139139
# Set default env
140140
RUN echo "source ${UV_PATH}/openpi/bin/activate" >> ~/.bashrc
141141

142+
FROM embodied-common-image AS embodied-robocasa-image
143+
144+
# Install openpi env
145+
RUN bash requirements/install.sh embodied --venv openpi --model openpi --env robocasa
146+
147+
RUN source switch_env openpi && download_assets --dir /opt/assets --assets openpi
148+
RUN link_assets
149+
150+
# Set default env
151+
RUN echo "source ${UV_PATH}/openpi/bin/activate" >> ~/.bashrc
152+
142153
FROM embodied-common-image AS embodied-isaaclab-image
143154

144155
# Install gr00t env

docs/source-en/rst_source/examples/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -252,6 +252,7 @@ Thanks to this decoupled design, workers can be flexibly and dynamically schedul
252252
metaworld
253253
isaaclab
254254
calvin
255+
robocasa
255256
pi0
256257
gr00t
257258
reasoning
Lines changed: 156 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,156 @@
1+
RL with RoboCasa Benchmark
2+
====================================
3+
4+
.. |huggingface| image:: /_static/svg/hf-logo.svg
5+
:width: 16px
6+
:height: 16px
7+
:class: inline-icon
8+
9+
This document provides a comprehensive guide for reinforcement learning training tasks using the RoboCasa benchmark in the RLinf framework.
10+
RoboCasa is a large-scale robotic learning simulation framework focused on manipulation tasks in kitchen environments, featuring diverse kitchen layouts, objects, and manipulation tasks.
11+
12+
RoboCasa combines realistic kitchen environments with diverse manipulation challenges, making it an ideal benchmark for developing generalizable robotic policies.
13+
The main goal is to train vision-language-action models capable of performing the following tasks:
14+
15+
1. **Visual Understanding**: Process RGB images from multiple camera viewpoints.
16+
2. **Language Understanding**: Interpret natural language task instructions.
17+
3. **Manipulation Skills**: Execute complex kitchen tasks such as pick-and-place, opening/closing doors, and appliance control.
18+
19+
Environment Overview
20+
--------------------
21+
22+
**RoboCasa Simulation Platform**
23+
24+
- **Environment**: RoboCasa Kitchen simulation environment (built on robosuite)
25+
- **Robot**: Panda manipulator with mobile base (PandaOmron), equipped with parallel gripper
26+
- **Tasks**: 24 atomic kitchen tasks covering multiple categories (excluding NavigateKitchen task that require moving the base)
27+
- **Observation**: Multi-view RGB images (robot view + wrist camera) + proprioceptive state
28+
- **Action Space**: 12-dimensional continuous actions
29+
30+
- 3D arm position delta
31+
- 3D arm rotation delta
32+
- 1D gripper control (open/close)
33+
- 4D base control
34+
- 1D mode selection (control base or arm)
35+
36+
**Task Categories**
37+
38+
RoboCasa provides diverse atomic tasks organized into multiple categories:
39+
40+
*Door Manipulation Tasks*:
41+
42+
- ``OpenSingleDoor``: Open cabinet or microwave door
43+
- ``CloseSingleDoor``: Close cabinet or microwave door
44+
- ``OpenDoubleDoor``: Open double cabinet doors
45+
- ``CloseDoubleDoor``: Close double cabinet doors
46+
- ``OpenDrawer``: Open drawer
47+
- ``CloseDrawer``: Close drawer
48+
49+
*Pick and Place Tasks*:
50+
51+
- ``PnPCounterToCab``: Pick from counter and place into cabinet
52+
- ``PnPCabToCounter``: Pick from cabinet and place on counter
53+
- ``PnPCounterToSink``: Pick from counter and place in sink
54+
- ``PnPSinkToCounter``: Pick from sink and place on counter
55+
- ``PnPCounterToStove``: Pick from counter and place on stove
56+
- ``PnPStoveToCounter``: Pick from stove and place on counter
57+
- ``PnPCounterToMicrowave``: Pick from counter and place in microwave
58+
- ``PnPMicrowaveToCounter``: Pick from microwave and place on counter
59+
60+
*Appliance Control Tasks*:
61+
62+
- ``TurnOnMicrowave``: Turn on microwave
63+
- ``TurnOffMicrowave``: Turn off microwave
64+
- ``TurnOnSinkFaucet``: Turn on sink faucet
65+
- ``TurnOffSinkFaucet``: Turn off sink faucet
66+
- ``TurnSinkSpout``: Turn sink spout
67+
- ``TurnOnStove``: Turn on stove
68+
- ``TurnOffStove``: Turn off stove
69+
70+
*Coffee Making Tasks*:
71+
72+
- ``CoffeeSetupMug``: Setup coffee mug
73+
- ``CoffeeServeMug``: Serve coffee into mug
74+
- ``CoffeePressButton``: Press coffee machine button
75+
76+
**Observation Structure**
77+
78+
- **Base Camera Image** (``base_image``): Robot left view (128×128 RGB)
79+
- **Wrist Camera Image** (``wrist_image``): End-effector view camera (128×128 RGB)
80+
- **Proprioceptive State** (``state``): 16-dimensional vector containing:
81+
82+
- ``[0:2]`` Robot base position (x, y)
83+
- ``[2:5]`` Padding zeros
84+
- ``[5:9]`` End-effector quaternion relative to base
85+
- ``[9:12]`` End-effector position relative to base
86+
- ``[12:14]`` Gripper joint velocities
87+
- ``[14:16]`` Gripper joint positions
88+
89+
**Data Structure**
90+
91+
- **Images**: Base camera RGB tensor ``[batch_size, 3, 128, 128]`` and wrist camera ``[batch_size, 3, 128, 128]``
92+
- **State**: Proprioceptive state tensor ``[batch_size, 16]``
93+
- **Task Description**: Natural language instructions
94+
- **Actions**: 7-dimensional continuous actions (position, quaternion, gripper)
95+
- **Reward**: Sparse reward based on task completion
96+
97+
Algorithm
98+
---------
99+
100+
**Core Algorithm Components**
101+
102+
1. **PPO (Proximal Policy Optimization)**
103+
104+
- Advantage estimation using GAE (Generalized Advantage Estimation)
105+
106+
- Policy clipping with ratio limits
107+
108+
- Value function clipping
109+
110+
- Entropy regularization
111+
112+
2. **GRPO (Group Relative Policy Optimization)**
113+
114+
- For every state / prompt the policy generates *G* independent actions
115+
116+
- Compute the advantage of each action by subtracting the group's mean reward.
117+
118+
Dependency Installation
119+
-----------------------
120+
121+
**Option 1: Docker Image**
122+
123+
Use the Docker image ``rlinf/rlinf:agentic-rlinf0.1-robocasa`` for the experiment.
124+
125+
**Option 2: Custom Environment**
126+
127+
Install dependencies directly in your environment by running the following command:
128+
129+
.. code:: bash
130+
131+
pip install uv
132+
bash requirements/install.sh embodied --model openpi --env robocasa
133+
source .venv/bin/activate
134+
135+
Dataset Download
136+
-----------------
137+
138+
.. code:: bash
139+
140+
python -m robocasa.scripts.download_kitchen_assets # Caution: Assets to be downloaded are around 5GB
141+
142+
Model Download
143+
--------------
144+
145+
.. code-block:: bash
146+
147+
# Download the model (choose either method)
148+
# Method 1: Using git clone
149+
git lfs install
150+
git clone https://huggingface.co/RLinf/RLinf-Pi0-RoboCasa
151+
git clone https://huggingface.co/RLinf/RLinf-Pi0-RoboCasa
152+
153+
# Method 2: Using huggingface-hub
154+
pip install huggingface-hub
155+
hf download RLinf/RLinf-Pi0-RoboCasa
156+
hf download RLinf/RLinf-Pi0-RoboCasa

docs/source-zh/rst_source/examples/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -247,6 +247,7 @@ RLinf的整体设计简洁且模块化,以Worker为抽象封装强化学习训
247247
metaworld
248248
isaaclab
249249
calvin
250+
robocasa
250251
pi0
251252
gr00t
252253
reasoning

0 commit comments

Comments
 (0)