simpler-env · k-makihara · Aug 7, 2025 · Aug 14, 2025 · Aug 14, 2025 · Aug 14, 2025
diff --git a/.devcontainer/devcontainer.json b/.devcontainer/devcontainer.json
@@ -0,0 +1,32 @@
+{
+    // docker build . -t simpler-env
+    "image": "simpler_env",
+    "runArgs": [
+        "--name=simpler_env-ynn",
+        "--net=host",
+        "--ipc=host",
+        "--gpus",
+        "all",
+        "-e",
+        "NVIDIA_DRIVER_CAPABILITIES=all",
+        "--privileged"
+    ],
+    "mounts": [
+        "source=/home/takanori.yoshimoto/code/SimplerEnv,target=/app/SimplerEnv,type=bind",
+        "source=/home/takanori.yoshimoto/code/hsr_openpi/checkpoints,target=/data/checkpoints,type=bind"
+    ],
+    "workspaceFolder": "/app/SimplerEnv",
+    "customizations": {
+        "vscode": {
+            "extensions": [
+                "ms-python.python",
+                "ms-python.vscode-pylance",
+                "ms-python.black-formatter",
+                "ms-python.isort",
+                "ms-azuretools.vscode-docker",
+                "github.copilot",
+                "github.vscode-pull-request-github"
+            ]
+        }
+    }
+}
diff --git a/.gitignore b/.gitignore
@@ -3,6 +3,9 @@ __pycache__
 _ext
 *.pyc
 *.so
+.venv/
+results/
+checkpoints/
 build/
 dist/
 *.egg-info/

diff --git a/.gitmodules b/.gitmodules
@@ -1,3 +1,9 @@
 [submodule "ManiSkill2_real2sim"]
 	path = ManiSkill2_real2sim
-	url = https://github.com/simpler-env/ManiSkill2_real2sim
+	url = https://github.com/airoa-org/ManiSkill2_real2sim.git
+	# url = https://github.com/allenzren/ManiSkill2_real2sim.git
+	# url = https://github.com/simpler-env/ManiSkill2_real2sim
+[submodule "Isaac-GR00T"]
+	path = Isaac-GR00T
+	url = https://github.com/NVIDIA/Isaac-GR00T.git
+	branch = main
diff --git a/Isaac-GR00T b/Isaac-GR00T
diff --git a/ManiSkill2_real2sim b/ManiSkill2_real2sim
diff --git a/README.md b/README.md
@@ -1,12 +1,12 @@
-# SimplerEnv: Simulated Manipulation Policy Evaluation Environments for Real Robot Setups
+# SimplerEnv: Simulated Manipulation Policy Evaluation Environments for Real Robot Setups (Multi-model Support 🔥)
 
 [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/simpler-env/SimplerEnv/blob/main/example.ipynb)
 
 ![](./images/teaser.png)
 
 Significant progress has been made in building generalist robot manipulation policies, yet their scalable and reproducible evaluation remains challenging, as real-world evaluation is operationally expensive and inefficient. We propose employing physical simulators as efficient, scalable, and informative complements to real-world evaluations. These simulation evaluations offer valuable quantitative metrics for checkpoint selection, insights into potential real-world policy behaviors or failure modes, and standardized setups to enhance reproducibility.
 
-This repository's code is based in the [SAPIEN](https://sapien.ucsd.edu/) simulator and the CPU based [ManiSkill2](https://maniskill2.github.io/) benchmark. We have also integrated the Bridge dataset environments into ManiSkill3, which offers GPU parallelization and can run 10-15x faster than the ManiSkill2 version. For instructions on how to use the GPU parallelized environments and evaluate policies on them, see: https://github.com/simpler-env/SimplerEnv/tree/maniskill3
+This repository is based in the [SAPIEN](https://sapien.ucsd.edu/) simulator and the [ManiSkill2](https://maniskill2.github.io/) benchmark (we will also integrate the evaluation envs into ManiSkill3 once it is complete).
 
 This repository encompasses 2 real-to-sim evaluation setups:
 - `Visual Matching` evaluation: Matching real & sim visual appearances for policy evaluation by overlaying real-world images onto simulation backgrounds and adjusting foreground object and robot textures in simulation.
@@ -23,12 +23,45 @@ We hope that our work guides and inspires future real-to-sim evaluation efforts.
   - [Code Structure](#code-structure)
   - [Adding New Policies](#adding-new-policies)
   - [Adding New Real-to-Sim Evaluation Environments and Robots](#adding-new-real-to-sim-evaluation-environments-and-robots)
-  - [Full Installation (RT-1 and Octo Inference, Env Building)](#full-installation-rt-1-and-octo-inference-env-building)
+  - [Full Installation (RT-1, Octo, OpenVLA Inference, Env Building)](#full-installation-rt-1-octo-openvla-inference-env-building)
     - [RT-1 Inference Setup](#rt-1-inference-setup)
     - [Octo Inference Setup](#octo-inference-setup)
+    - [OpenVLA Inference Setup](#openvla-inference-setup)
   - [Troubleshooting](#troubleshooting)
   - [Citation](#citation)
 
+## Benchmark @ GoogleSheets
+> [!TIP]
+> We maintain a public Google Sheets documenting the latest SOTA models' performance and fine-tuned weights on Simpler-Env, making community benchmarking more accessible. Welcome to contribute and update!
+>
+> [simpler env benchmark @ GoogleSheets 📊](https://docs.google.com/spreadsheets/d/1cLhEW9QnVkP4rqxsFVzdBVRyBWVdSm0d5zp1L_-QJx4/edit?usp=sharing)
+<img width="1789" alt="image" src="https://github.com/user-attachments/assets/68e2ad3d-b24f-4562-97d9-434f23cede86" />
+
+
+
+## Models
+> [!NOTE]
+> Hello everyone!
+> This repository has now fully opened issues and discussions. We warmly welcome you to: 🤗
+> Discuss any problems you encounter 🙋
+> Submit fixes ✅
+> Support new models! 🚀
+> Given the significant environmental differences across various models and the specific dependencies required for simulator rendering, I will soon provide a Docker solution and a benchmark performance table. I’ll also do my best to address any issues you run into.
+> Thank you for your support and contributions! 🎉
+>
+> To support state input, we use the submodule `ManiSkill2_real2sim` from https://github.com/allenzren/ManiSkill2_real2sim
+
+| Model Name   | support | Note  |
+| -----------  | -----  | -----  |
+| Octo         | ✅     |        |
+| RT1          | ✅     |        |
+| OpenVLA      | ✅     |        |
+| CogACT       | ✅     | OpenVLA-based         |
+| SpatialVLA   | ✅     | [transformers == 4.47.0](https://github.com/SpatialVLA/SpatialVLA) |
+| Pi0/Pi0-Fast (openpi version) | ✅     | [openpi](https://github.com/Physical-Intelligence/openpi) |
+| Pi0/Pi0-Fast (lerobot version)     | ✅ | [lerobot](https://github.com/huggingface/lerobot) |
+| GR00T     | ✅ | [Isaac-GR00T](https://github.com/NVIDIA/Isaac-GR00T) |
+
 
 ## Getting Started
 
@@ -71,13 +104,13 @@ Prerequisites:
 
 Create an anaconda environment:
 ```
-conda create -n simpler_env python=3.10 (3.10 or 3.11)
+conda create -n simpler_env python=3.10 (any version above 3.10 should be fine)
 conda activate simpler_env
 ```
 
 Clone this repo:
 ```
-git clone https://github.com/simpler-env/SimplerEnv --recurse-submodules
+git clone https://github.com/simpler-env/SimplerEnv --recurse-submodules --depth 1
 ```
 
 Install numpy<2.0 (otherwise errors in IK might occur in pinocchio):
@@ -97,15 +130,15 @@ cd {this_repo}
 pip install -e .
 ```
 
-**If you'd like to perform evaluations on our provided agents (e.g., RT-1, Octo), or add new robots and environments, please additionally follow the full installation instructions [here](#full-installation-rt-1-and-octo-inference-env-building).**
+**If you'd like to perform evaluations on our provided agents (e.g., RT-1, Octo, OpenVLA), or add new robots and environments, please additionally follow the full installation instructions [here](#full-installation-rt-1-octo-openvla-inference-env-building).**
 
 
 ## Examples
 
-- Simple RT-1 and Octo evaluation script on prepackaged environments with visual matching evaluation setup: see [`simpler_env/simple_inference_visual_matching_prepackaged_envs.py`](https://github.com/simpler-env/SimplerEnv/blob/main/simpler_env/simple_inference_visual_matching_prepackaged_envs.py).
+- Simple RT-1, Octo, and OpenVLA evaluation script on prepackaged environments with visual matching evaluation setup: see [`simpler_env/simple_inference_visual_matching_prepackaged_envs.py`](https://github.com/simpler-env/SimplerEnv/blob/main/simpler_env/simple_inference_visual_matching_prepackaged_envs.py).
 - Colab notebook for RT-1 and Octo inference: see [this link](https://colab.research.google.com/github/simpler-env/SimplerEnv/blob/main/example.ipynb).
 - Environment interactive visualization and manual control: see [`ManiSkill2_real2sim/mani_skill2_real2sim/examples/demo_manual_control_custom_envs.py`](https://github.com/simpler-env/ManiSkill2_real2sim/blob/main/mani_skill2_real2sim/examples/demo_manual_control_custom_envs.py)
-- Policy inference scripts to reproduce our Google Robot and WidowX real-to-sim evaluation results with sweeps over object / robot poses and advanced loggings. These contain both visual matching and variant aggregation evaluation setups along with RT-1, RT-1-X, and Octo policies. See [`scripts/`](https://github.com/simpler-env/SimplerEnv/tree/main/scripts).
+- Policy inference scripts to reproduce our Google Robot and WidowX real-to-sim evaluation results with sweeps over object / robot poses and advanced loggings. These contain both visual matching and variant aggregation evaluation setups along with RT-1, RT-1-X, Octo, and OpenVLA policies. See [`scripts/`](https://github.com/simpler-env/SimplerEnv/tree/main/scripts).
 - Real-to-sim evaluation videos from running `scripts/*.sh`: see [this link](https://huggingface.co/datasets/xuanlinli17/simpler-env-eval-example-videos/tree/main).
 
 ## Current Environments
@@ -183,6 +216,7 @@ simpler_env/
    policies/: policy implementations
       rt1/: RT-1 policy implementation
       octo/: Octo policy implementation
+      openvla/: OpenVLA policy implementation
    utils/:
       env/: environment building and observation utilities
       debug/: debugging tools for policies and robots
@@ -205,7 +239,7 @@ scripts/: example bash scripts for policy inference under our variant aggregatio
 
 If you want to use existing environments for evaluating new policies, you can keep `./ManiSkill2_real2sim` as is.
 
-1. Implement new policy inference scripts in `simpler_env/policies/{your_new_policy}`, following the examples for RT-1 (`simpler_env/policies/rt1`) and Octo (`simpler_env/policies/octo`) policies.
+1. Implement new policy inference scripts in `simpler_env/policies/{your_new_policy}`, following the examples for RT-1 (`simpler_env/policies/rt1`), Octo (`simpler_env/policies/octo`), and OpenVLA (`simpler_env/policies/openvla`) policies.
 2. You can now use `simpler_env/simple_inference_visual_matching_prepackaged_envs.py` to perform policy evaluations in simulation.
    - If the policy behaviors deviate a lot from those in the real-world, you can write similar scripts as in `simpler_env/utils/debug/{policy_name}_inference_real_video.py` to debug the policy behaviors. The debugging script performs policy inference by feeding real eval video frames into the policy. If the policy behavior still deviates significantly from real, this may suggest that policy actions are processed incorrectly into the simulation environments. Please double check action orderings and action spaces.
 3. If you'd like to perform customized evaluations,
@@ -219,7 +253,7 @@ If you want to use existing environments for evaluating new policies, you can ke
 We provide a step-by-step guide to add new real-to-sim evaluation environments and robots in [this README](ADDING_NEW_ENVS_ROBOTS.md)
 
 
-## Full Installation (RT-1 and Octo Inference, Env Building)
+## Full Installation (RT-1, Octo, OpenVLA Inference, Env Building)
 
 If you'd like to perform evaluations on our provided agents (e.g., RT-1, Octo), or add new robots and environments, please follow the full installation instructions below.
 
@@ -289,6 +323,13 @@ If you are using CUDA 12, then to use GPU for Octo inference, you need CUDA vers
 
 `PATH=/usr/local/cuda-12.3/bin:$PATH   LD_LIBRARY_PATH=/usr/local/cuda-12.3/lib64:$LD_LIBRARY_PATH   bash scripts/octo_xxx_script.sh`
 
+### OpenVLA Inference Setup
+
+```
+pip install torch==2.3.1 torchvision==0.18.1 timm==0.9.10 tokenizers==0.15.2 accelerate==0.32.1
+pip install flash-attn==2.6.1 --no-build-isolation
+```
+
 ## Troubleshooting
 
 1. If you encounter issues such as
@@ -307,6 +348,10 @@ Follow [this link](https://maniskill.readthedocs.io/en/latest/user_guide/getting
 TypeError: 'NoneType' object is not subscriptable
 ```
 
+3. Please also refer to the original repo or [vulkan_setup](https://github.com/SpatialVLA/SpatialVLA/issues/3#issuecomment-2641739404) if you encounter any problems.
+
+4. `tensorflow-2.15.0` conflicts with `tensorflow-2.15.1`?
+The dlimp library has not been maintained for a long time, so the TensorFlow version might be out of date. A reliable solution is to comment out tensorflow==2.15.0 in the requirements file, install all other dependencies, and then install tensorflow==2.15.0 finally. Currently, using tensorflow==2.15.0 has not caused any problems.
 
 ## Citation
 

diff --git a/docker/10_nvidia.json b/docker/10_nvidia.json
@@ -0,0 +1,6 @@
+{
+    "file_format_version": "1.0.0",
+    "ICD": {
+        "library_path": "libEGL_nvidia.so.0"
+    }
+}
diff --git a/docker/Dockerfile b/docker/Dockerfile
@@ -0,0 +1,93 @@
+FROM nvidia/cuda:12.1.0-devel-ubuntu22.04
+
+# 基本ツール
+ENV DEBIAN_FRONTEND=noninteractive \
+    NVIDIA_DRIVER_CAPABILITIES=all \
+    TZ=Asia/Tokyo \
+    PYTHONUNBUFFERED=1 \
+    VK_ICD_FILENAMES=/etc/vulkan/icd.d/nvidia_icd.json
+
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    bash-completion build-essential ca-certificates cmake curl git git-lfs \
+    htop libegl1 libxext6 libjpeg-dev libpng-dev libvulkan1 rsync \
+    tmux unzip vim wget xvfb pkg-config ffmpeg \
+    libglvnd-dev libgl1-mesa-dev libegl1-mesa-dev libgles2-mesa-dev libglib2.0-0 \
+    libsm6 libxrender1 libgomp1 libglu1-mesa libxi6 software-properties-common \
+    python3.11 python3.11-dev python3-pip && \
+    rm -rf /var/lib/apt/lists/*
+
+# Python 3.11をデフォルトに設定
+RUN update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.11 1 && \
+    update-alternatives --install /usr/bin/python python /usr/bin/python3.11 1 && \
+    python3 -m pip install --no-cache-dir --upgrade pip setuptools wheel
+
+# Git LFSを有効化
+RUN git lfs install
+
+# Vulkan/EGL設定ファイルをコピー
+COPY docker/nvidia_icd.json /etc/vulkan/icd.d/nvidia_icd.json
+COPY docker/nvidia_layers.json /etc/vulkan/implicit_layer.d/nvidia_layers.json
+COPY docker/10_nvidia.json /usr/share/glvnd/egl_vendor.d/10_nvidia.json
+
+# 作業ディレクトリ
+WORKDIR /workspace
+
+# ソースコードをコピー（ビルド時のインストールに必要）
+# 注意: 実行時は run_docker.sh でホストのコードがマウントされ、これは上書きされます
+COPY . /workspace/
+
+# SimplerEnv と ManiSkill2_real2sim をインストール
+RUN cd /workspace/ManiSkill2_real2sim && pip install --no-cache-dir -e . && cd /workspace
+RUN pip install --no-cache-dir -e /workspace
+RUN pip install --no-cache-dir -r /workspace/requirements_full_install.txt
+
+# PyTorch と torchvision (CUDA 12.1)
+RUN pip install --no-cache-dir --index-url https://download.pytorch.org/whl/cu121 \
+    torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1
+
+# Flash Attention
+RUN pip install --no-cache-dir --no-deps \
+    https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.1.post4/flash_attn-2.7.1.post4+cu12torch2.5cxx11abiFALSE-cp311-cp311-linux_x86_64.whl
+
+# Isaac-GR00T のベース依存（--no-deps で重複を避ける）
+RUN cd /workspace/Isaac-GR00T && pip install --no-cache-dir -e .[base] --no-deps && cd /workspace
+
+# 追加ライブラリ（問題が発生しやすい依存関係を明示的にインストール）
+RUN pip install --no-cache-dir \
+    pandas==2.2.3 \
+    "pydantic>=2,<3" typing_extensions==4.12.2 --force-reinstall
+
+RUN pip install --no-cache-dir \
+    albumentations==1.4.18 albucore==0.0.17 scikit-image==0.25.2 lazy_loader==0.4 --no-deps
+
+RUN pip install --no-cache-dir \
+    decord==0.6.0 av==12.3.0 --no-deps
+
+RUN pip install --no-cache-dir \
+    nptyping==2.5.0 numpydantic==1.6.10 --no-deps
+
+RUN pip install --no-cache-dir \
+    diffusers==0.30.2 timm==1.0.14 peft==0.14.0
+
+RUN pip install --no-cache-dir \
+    transformers==4.51.3 --force-reinstall --no-deps
+
+RUN pip install --no-cache-dir \
+    pyzmq --no-deps
+
+RUN pip install --no-cache-dir \
+    "tokenizers>=0.21,<0.22" --no-deps
+
+# PyTorch3D（ソースからビルド）
+RUN pip install --no-cache-dir "git+https://github.com/facebookresearch/pytorch3d.git"
+
+# NumPy < 2.0 を再度強制（PyTorch3Dがアップグレードする可能性があるため）
+RUN pip install --no-cache-dir --force-reinstall "numpy>=1.24.4,<2.0"
+
+# エントリーポイントスクリプトをコピー
+COPY docker/entrypoint.sh /usr/local/bin/entrypoint.sh
+RUN chmod +x /usr/local/bin/entrypoint.sh
+
+# エントリーポイントを設定（起動時に設定ファイルを自動コピー）
+ENTRYPOINT ["/usr/local/bin/entrypoint.sh"]
+CMD ["bash"]
diff --git a/docker/README.md b/docker/README.md
@@ -0,0 +1,83 @@
+# SimplerEnv Docker Environment
+
+このディレクトリには、SimplerEnv (benchmark-v2) のDocker環境を構築するためのスクリプトが含まれています。
+
+## 前提条件
+
+- NVIDIA Driver >= 535 / CUDA 12.1 相当
+- Docker がインストールされていること
+- nvidia-docker (NVIDIA Container Toolkit) がインストールされていること
+- Linux + GPU ノード
+
+## クイックスタート
+
+### 1. Docker イメージをビルド
+
+リポジトリのルートディレクトリで以下を実行します。
+
+```bash
+bash docker/build_docker.sh
+```
+
+ビルドには30〜60分程度かかります。このプロセスでは以下がインストールされます：
+
+- Python 3.11
+- PyTorch 2.5.1 (CUDA 12.1)
+- SimplerEnv と ManiSkill2_real2sim
+- Isaac-GR00T と全ての依存関係
+- Flash Attention 2.7.1
+- PyTorch3D
+- その他の必要なライブラリ
+
+### 2. コンテナを起動
+
+```bash
+bash docker/run_docker.sh
+```
+
+### 3. コンテナに入る
+
+```bash
+docker exec -it simplerenv bash
+```
+
+## 使用方法
+
+コンテナ内では、すべての依存関係がインストール済みなので、そのまま評価スクリプトを実行できます。
+
+### WidowX (Bridge) での評価
+
+```bash
+python scripts/gr00t/evaluate_bridge.py \
+  --ckpt-path /path/to/checkpoint-group6/
+```
+
+### Google Robot (Fractal) での評価
+
+```bash
+python scripts/gr00t/evaluate_fractal.py \
+  --ckpt-path /path/to/checkpoint-group6/
+```
+
+## 技術詳細
+
+### インストールされる主要なパッケージ
+
+- **PyTorch**: 2.5.1 (CUDA 12.1)
+- **Flash Attention**: 2.7.1.post4
+- **Transformers**: 4.51.3
+- **Diffusers**: 0.30.2
+- **Timm**: 1.0.14
+- **PEFT**: 0.14.0
+- **PyTorch3D**: latest from source
+- **その他**: pandas, pydantic, albumentations, decord, av, nptyping, numpydantic, pyzmq, tokenizers
+
+### Dockerfileの構成
+
+1. ベースイメージ: `nvidia/cuda:12.1.0-devel-ubuntu22.04`
+2. システムパッケージのインストール
+3. Python 3.11のセットアップ
+4. SimplerEnv と ManiSkill2_real2sim のインストール
+5. PyTorch とその他のディープラーニングライブラリのインストール
+6. Isaac-GR00T の依存関係のインストール
+7. 設定ファイルの上書き
-Original file line number
+Diff line change
@@ Expand Up / @@ -3,6 +3,9 @@ __pycache__ @@
     _ext
     *.pyc
     *.so
+    .venv/
+    results/
+    checkpoints/
     build/
     dist/
     *.egg-info/
@@ Expand Down @@
+10 −1		mani_skill2_real2sim/agents/base_agent.py
+7 −1		mani_skill2_real2sim/agents/robots/googlerobot.py
+7 −2		mani_skill2_real2sim/agents/robots/widowx.py
+197 −108		mani_skill2_real2sim/envs/custom_scenes/grasp_single_in_scene.py
+1 −0		mani_skill2_real2sim/envs/custom_scenes/move_near_in_scene.py
+1 −0		mani_skill2_real2sim/envs/custom_scenes/open_drawer_in_scene.py
+25 −45		mani_skill2_real2sim/envs/custom_scenes/place_in_closed_drawer_in_scene.py