Ψ₀: An Open Foundation Model
Towards Universal Humanoid Loco-Manipulation

Contributors: Songlin Wei, Hongyi Jing, Boqian Li, Zhenyu Zhao, Jiageng Mao, Zhenhao Ni , Sicheng He, Jie Liu, Xiawei Liu, Kaidi Kang, Sheng Zang,Weiduo Yuan, Marco Pavone, Di Huang, Yue Wang

$\Psi_0$ is an open vision-language-action (VLA) model for dexterous humanoid loco-manipulation. Our model first learns task semantics and visual representation from large-scale human egocentic videos, and then is post-trained on a smaller amount of real-world teleoperated robot data, to learn general dynamics of the embodiment.

Our foundation model is capable of acquiring new long-horizontal dexterous loco-manipulation skill by fine-tuning using as few as 80 trajectories. Our key finding is that scaling the right data in the right way.

At the top, the $\Psi_0$ model consists of two end-to-end trained components: a vision–language backbone (System-2) and a multimodal diffusion transformer (System-1) action expert. The backbone is based on Qwen’s Qwen3-VL-2B-Instruct, which extracts vision–language features from observations and instructions. These features condition a flow-based multimodal diffusion transformer inspired by Stable Diffusion 3. The action expert (≈500M parameters) predicts future whole-body action chunks, enabling efficient fusion of visual, linguistic, and action representations. At the lowest level (System-0), an RL-based tracking controller executes the predicted lower-body action commands, ensuring stable and precise physical control.

Finetune Ψ₀ on Unitree G1 Humanoid Robot
Baselines
- GR00T N1.6
- OpenPi π0.5
- InternVLA-M1
- H-RDT
- EgoVLA
- Diffusion Policy
- ACT
Simulation
Reproduce Ψ₀: Pre-Training and Post-Training
Checkpoints
Troubleshootings
Citation

Finetune Ψ₀ on Unitree G1 Humanoid Robot

Installation

Clone the project and change directory to the project root:

git clone git@github.com:physical-superintelligence-lab/Psi0.git 
cd Psi0

We use uv to manage Python dependencies. Install uv if not already installed:

curl -LsSf https://astral.sh/uv/install.sh | sh

Set up the $\Psi_0$ environment:

ℹ️ We manage the $\Psi_0$ environment and all the baselines through uv and they all share the same src/ code. See Environment Management for more details.

uv venv .venv-psi --python 3.10
source .venv-psi/bin/activate
GIT_LFS_SKIP_SMUDGE=1 uv sync --all-groups --index-strategy unsafe-best-match --active
uv pip install flash_attn==2.7.4.post1 --no-build-isolation

Test installation, a version number should be displayed.

python -c "import psi;print(psi.__version__)"

Verify the shared lerobot stack is importable.

python -c "from psi.data.lerobot.compat import LEROBOT_LAYOUT; print(LEROBOT_LAYOUT)"

Data Collection

📂 We open-sourced all the 9 real-world tasks. You can directly download the data and jump to the Fine-Tuning.

🔥 We first release our internal test data collection pipeline which uses Apple Vision Pro to teleoperate Unitree G1 humanoid Robot with two Dex3-1 hands.

See the detailed teleoperation guide here:
Real-World Teleoperation Guide

Pre-Processing: Convert Raw Data to LeRobot Format

export task=Hug_box_and_move

hf download USC-PSI-Lab/psi-data \
  g1_real_raw/$task.zip \
  --local-dir=$PSI_HOME/data/real_teleop_g1 \
  --repo-type=dataset

unzip $PSI_HOME/data/real_teleop_g1/g1_real_raw/$task.zip -d $PSI_HOME/data/real_teleop_g1/g1_real_raw/$task

You should observe similar folder structure:

g1_real_raw
└── Hug_box_and_move
    ├── episode_0
    │   ├── color
    │   │   ├── frame_000000.jpg
    │   │   └── ...
    │   └── data.json
    └── ...

Edit the task description file with the following format, eg.,

vim scripts/data/task_description_dict.json

{
  "Hug_box_and_move": "Hug box and move."
}

Run conversion script

python scripts/data/raw_to_lerobot.py \
  --data-root=/hfm/data/real_teleop_g1/g1_real_raw \
  --work-dir=/hfm/data/real \
  --repo-id=psi0-real-g1 \
  --robot-type=g1 \
  --task=$task

Calculate stats

python scripts/data/calc_modality_stats.py \
  --work-dir=$PSI_HOME/data/real \
  --task=$task

Create $\Psi_0$ format stats (simply a copy for now)

cp $PSI_HOME/data/real/$task/meta/stats.json $PSI_HOME/data/real/$task/meta/stats_psi0.json

Now it's ready to finetune $\Psi_0$.

✈️ If training env is already configured, directly launch training via scripts/train/psi0/finetune-real-psi0.sh $task

Fine-Tuning

✔️ Suppose the data is already collected and processed. Now we can proceed to fine-tune the $\Psi_0$ model.

📝 Here we illustrate by using the pre-collected data from Huggingface psi-data.

Set up the environment variables following .env.sample. The environment variables will be loaded by the dotenv.load_dotenv() in python.

cp .env.sample .env
# and edit the following env variables 
# HF_TOKEN=<YOUR HF READ TOKEN>
# WANDB_API_KEY=<API KEY for wandb logging>
# WANDB_ENTITY=<wandb entity>
# PSI_HOME=<Path where PSI cache/checkpoint/data are located by convention>

source .env
echo $PSI_HOME

Download the collected real-world data and extract it:

export task=Pick_bottle_and_turn_and_pour_into_cup

hf download USC-PSI-Lab/psi-data \
  real/$task.zip \
  --local-dir=$PSI_HOME/data \
  --repo-type=dataset

unzip $PSI_HOME/data/real/$task.zip -d $PSI_HOME/data/real

👀 If you want to visualize the episode please refer to the Data Visualization in the examples.

Launch the training script:

scripts/train/psi0/finetune-real-psi0.sh $task

🖥️ You can always change the GPUs, e.g., CUDA_VISIBLE_DEVICES=0,1,2,3 scripts/train/....

⚠️ Please try to maintain a reasonable global batch size = device batch size x number of GPUs x gradient accumulation step. We use global batch size 128 throughout all the real-world and simulation experiments.

Open-Loop Evaluation

Follow the steps in examples/simple/openloop_eval.ipynb

Load the training dataset, and run model inference to see how model fits the training data.

Deployment

Serve $\Psi_0$ (RTC mode)

bash ./scripts/deploy/serve_psi0-rtc.sh

Start $\Psi_0$ Client (RTC mode)

bash ./real/scripts/deploy_psi0-rtc.sh

For detailed real-world deployment environment setup, please also refer to the dedicated documentation:

Real-World Teleoperation Guide

Baselines

GR00T

Install the env

cd src/gr00t; uv sync

training

cd src/gr00t
./scripts/train_gr00t.sh --dataset-path /your/lerobot/dataset

serving a checkpoint

cd src/gr00t
./scripts/deploy_gr00t.sh

openloop eval on trained checkpoint using gt

cd src/gr00t
./scripts/openloop_eval.sh

InternVLA-M1

Install the env

cd src/InternVLA-M1; uv sync --python 3.10

training

cd src/InternVLA-M1
bash scripts/train_internvla.sh

serving a checkpoint

cd src/InternVLA-M1
./scripts/deploy_internvla.sh

Simulation

We use SIMPLE to benchmark $\Psi_0$ and all the baselines.

📢 SIMPLE is an easy-to-use humanoid benchmarking simulator built on the MuJoCo physics engine and Isaac Sim rendering.

Install SIMPLE

[Coming soon]

Data Generation

📂 We also provide 5 pre-collected whole-body humanoid loco-manipulation tasks at Huggingface psi-data. If you want to use the existing simulation data, jump to the Fine-Tuning

Motion-Planning Based Data Generation

[Coming soon]

Teleoperation in Simulator

[Coming soon]

Fine-Tuning

Download SIMPLE task data and extract it:

💡 Dont forget source .env first before following below commands.

export task=G1WholebodyBendPick-v0

hf download USC-PSI-Lab/psi-data \
  simple/$task$.zip \
  --local-dir=$PSI_HOME/data \
  --repo-type=dataset

unzip $PSI_HOME/data/simple/$task.zip -d $PSI_HOME/data/simple

👀 If you want to visualize the episode please refer to the Data Visualization in the examples.

Start training:

Please set up the envrionment variables if not done so yet.

bash scripts/train/psi0/finetune-simple-psi0.sh $task

The training will create a run dir which is located under .runs in the project root. If your GPU has limited VRAM, set --train.optimizer-foreach=false to reduce optimizer-step memory usage at the cost of some speed.

Evaluation in SIMPLE

Serve $\Psi_0$

export run_dir=<the run dir here under folder .runs>
export ckpt_step=<checkpoint step>
uv run --active --group psi --group serve serve_psi0 \
  --host 0.0.0.0 \
  --port 22085 \
  --run-dir=$run_dir \
  --ckpt-step=$ckpt_step

Run open-loop evaluation (offline)

examples/simple/openloop_eval.ipynb

Run the Evaluation Client

If the server is started on a remote server, run ssh port forward. eg., ssh -L 22086:localhost:22086 songlin@nebula100.

Once port forward is done, open a new terminal to test if server is up curl -i http://localhost:22085/health

Launch the eval client through docker

GPUs=1 docker compose run eval $task psi0 \
    --host=localhost \
    --port=22085  \
    --sim-mode=mujoco_isaac \
    --headless \
    --max-episode-steps=360 \
    --num-episodes=10 \
    --data-format=lerobot \
    --data-dir=data/$task

The policy rollout videos will be found in folder third_party/SIMPLE/data/evals/psi0.

The evaluation for a single episode could take up to 6~10 minutes because SIMPLE use a synchronous rendering API in IsaacSim. See here for more explanation.

Reproduce Ψ₀: Pre-Training and Post-Training

Pre-Train VLM

Download and cache the official Qwen/Qwen3-VL-2B-Instruct weights.

scripts/predownload_qwen3vl.py

Pre-train on the EgoDex dataset

Pre-compute 48 DoF EgoDex action:

We re-use the pre-process code from H-RDT EgoDex Pre-Processing.

Change the paths in src/h_rdt/datasets/pretrain/setup_pretrain.sh.

Tweak the NUM_PROCESSES if on a powerful server, i tried max 64.

set FORCE_OVERWRITE=True if the processing script is disrupted.

source src/h_rdt/datasets/pretrain/setup_pretrain.sh
source .venv-psi/bin/activate
bash src/h_rdt/datasets/pretrain/run_pretrain_pipeline.sh

bash scripts/train/psi0/pretrain-egodex-psi0-fast.sh

Pre-train on humanoid everyday dataset

bash scripts/train/psi0/pretrain-he-psi0-fast.sh

Save the pretrained checkpoints once training is done:

python scripts/save_pretrain_qwen3vl_backbone.py

Post-Train Action Expert

Download pre-trained psi-0 VLM backbone

python scripts/data/download.py \
  --repo-id=USC-PSI-Lab/psi-model \
  --remote-dir=pre.fast.egodex.2512241941/pretrained/ckpt_200000 \
  --local-dir=/hfm/cache/checkpoints/psi0/pre.fast.egodex.2512241941.ckpt200k \
  --repo-type=model

Post-train on humanoid everyday (HE) dataset

bash scripts/train/psi0/posttrain-he-psi0.sh

Save post-trained action header once training is over

python scripts/save_posttrain_action_expert.py

Checkpoints

The released checkpoints on HuggingFace Psi-Model is listed

Checkpoint	Description	Remote Directory
$\Psi_0$ VLM (Baseline)	Pre-trained VLM backbone (EgoDex 200K steps + HE 30K steps)	`psi0/pre.fast.1by1.2601091803.ckpt.ego200k.he30k`
$\Psi_0$ Action Expert (Baseline)	Post-trained Action Expert On HE	`psi0/postpre.1by1.pad36.2601131206.ckpt.he30k`

and more variants for ablation studies:

Checkpoint	Description	Remote Directory
$\Psi_0$ VLM (Ablation Study)	Pre-trained VLM backbone only on EgoDex 200K steps	`psi0/pre.fast.egodex.2512241941.ckpt200k`
$\Psi_0$ VLM (Ablation Study)	Pre-trained VLM backbone only on HE 48K steps	`psi0/pre.abl.only.he.2512311516.48k`
$\Psi_0$ VLM (Ablation Study)	Pre-trained VLM backbone only on 10% EgoDex	`psi0/pre.abl.ego.10per.2602021632.46k`
$\Psi_0$ Action Expert (Ablation Study)	Post-train on HE by picking pre-trained variant `psi0/pre.abl.only.he.2512311516.48k`	`psi0/postpre.abl.only.he.2602050012`
$\Psi_0$ Action Expert (Ablation Study)	Post-train on HE by picking pre-trained variant `psi0/pre.abl.ego.10per.2602021632.46k`	`psi0/postpre.abl.ego.10per.2602050006`

Download the selected models

hf download USC-PSI-Lab/psi-model \
  --remote-dir=<remote dictory on huggingface repo>
  --local-dir=$PSI_HOME/cache/checkpoints \
  --repo-type=model

Troubleshootings

Lerobot dataset issues: stack(): argument 'tensors' (position 1) must be tuple of Tensors, not Column

This usually means the environment is still on the legacy PSI lerobot stack. Resync the PSI env so it uses the same lerobot and datasets versions as SIMPLE, then verify the import layout:

source .venv-psi/bin/activate
uv sync --group psi --active
python -c "from psi.data.lerobot.compat import LEROBOT_LAYOUT; print(LEROBOT_LAYOUT)"

Fail to install evdev, src/evdev/input.c:10:10: fatal error: Python.h: No such file or directory

sudo apt update
sudo apt install -y python3-dev python3-venv build-essential \
    linux-headers-$(uname -r)

RuntimeError: Could not load libtorchcodec. Likely causes ...

sudo apt-get install ffmpeg

ImportError: cannot import name 'Deprecated' from 'wandb.proto.wandb_telemetry_pb2'

re-install wandb

source .venv-pusht/bin/activate
uv pip uninstall wandb
uv pip install wandb==0.18.0

support sm_120 on newer GPUs like 5090 or RTX 6000, UserWarning: Ignoring invalid value for boolean flag CUDA_LAUNCH_BLOCKING: truevalid values are 0 or 1.

update torch and flash-attn

uv pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/cu128
uv pip install flash-attn --no-build-isolation

Failed to download and build lerobot ... , Use git lfs logs last to view the log.

GIT_LFS_SKIP_SMUDGE=1 uv ...

Citation

@misc{wei2026psi0,
  title={$\Psi_0$: An Open Foundation Model Towards Universal Humanoid Loco-Manipulation}, 
  author={Songlin Wei and Hongyi Jing and Boqian Li and Zhenyu Zhao and Jiageng Mao and Zhenhao Ni and Sicheng He and Jie Liu and Xiawei Liu and Kaidi Kang and Sheng Zang and Weiduo Yuan and Marco Pavone and Di Huang and Yue Wang},
  year={2026},
  eprint={2603.12263},
  archivePrefix={arXiv},
  primaryClass={cs.RO},
  url={https://arxiv.org/abs/2603.12263}, 
}

License

This project is licensed under the Apache License 2.0.

See the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ψ₀: An Open Foundation Model
Towards Universal Humanoid Loco-Manipulation

Table of Contents

Finetune Ψ₀ on Unitree G1 Humanoid Robot

Installation

Data Collection

Pre-Processing: Convert Raw Data to LeRobot Format

Fine-Tuning

Open-Loop Evaluation

Deployment

Serve $\Psi_0$ (RTC mode)

Start $\Psi_0$ Client (RTC mode)

Baselines

GR00T

InternVLA-M1

Simulation

Install SIMPLE

Data Generation

Motion-Planning Based Data Generation

Teleoperation in Simulator

Fine-Tuning

Evaluation in SIMPLE

Serve $\Psi_0$

Run the Evaluation Client

Reproduce Ψ₀: Pre-Training and Post-Training

Pre-Train VLM

Post-Train Action Expert

Checkpoints

Troubleshootings

Citation

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Ψ₀: An Open Foundation Model Towards Universal Humanoid Loco-Manipulation

Table of Contents

Finetune Ψ₀ on Unitree G1 Humanoid Robot

Installation

Data Collection

Pre-Processing: Convert Raw Data to LeRobot Format

Fine-Tuning

Open-Loop Evaluation

Deployment

Serve $\Psi_0$ (RTC mode)

Start $\Psi_0$ Client (RTC mode)

Baselines

GR00T

InternVLA-M1

Simulation

Install SIMPLE

Data Generation

Motion-Planning Based Data Generation

Teleoperation in Simulator

Fine-Tuning

Evaluation in SIMPLE

Serve $\Psi_0$

Run the Evaluation Client

Reproduce Ψ₀: Pre-Training and Post-Training

Pre-Train VLM

Post-Train Action Expert

Checkpoints

Troubleshootings

Citation

License

Ψ₀: An Open Foundation Model
Towards Universal Humanoid Loco-Manipulation