mini-pi0 is a modular research and development codebase for training and evaluating flow-matching robot action policies from demonstrations.
It includes:
- unified CLI for train / eval / deploy-sim / vision precompute
- typed YAML config system with CLI overrides
- modular simulator adapters (Robosuite runtime, ManiSkill3 + IsaacLab scaffolds)
- native
robomimic_hdf5andlerobot_hfdataset loading - image and precomputed-vision conditioning pipelines
- Robosuite: full runtime support
- ManiSkill3: scaffolded
- IsaacLab: scaffolded
Check local backend diagnostics:
.venv/bin/python -m mini_pi0 backendsmini_pi0/
cli/ # unified CLI entrypoints
config/ # typed dataclass schema + YAML load/merge
sim/ # simulator adapter API + backend implementations
dataset/ # dataset readers, episode loader, torch dataset, stats
models/ # model registry + flow matching policy
vision/ # vision backbones + feature precompute
train/ # training runner
eval/ # evaluation loop, metrics, plots, grids
deploy/ # simulation deployment loop
utils/ # run directory and device helpers
examples/configs/
robosuite_can_vision.yaml
robosuite_lift.yaml
robosuite_lift_lerobot.yaml
robosuite_lift_robomimic.yaml
maniskill3_pickcube.yaml # scaffold
isaaclab_scaffold.yaml # scaffold
# 1) create venv
uv venv --python 3.13 .venv
source .venv/bin/activate
# 2) install dependencies
uv sync --extra dev --extra lerobot --extra vision --extra hardwareIf you prefer pip-style install:
uv pip install -r requirements.txtSupported formats:
robomimic_hdf5lerobot_hf
Detailed guide:
Download robomimic example:
.venv/bin/python -m mini_pi0 download-robomimic \
--task can \
--dataset_type ph \
--hdf5_type low_dim \
--download_dir data/robomimicDownload directly from Hugging Face dataset repos:
# LeRobot dataset snapshot (uses local HF cache + saves under data/lerobot)
.venv/bin/hf download robotgeneralist/robosuite_can_ph \
--repo-type dataset \
--local-dir data/lerobot/robosuite_can_ph
# Single robomimic file from HF
.venv/bin/hf download robomimic/robomimic_datasets \
--repo-type dataset \
v1.5/can/ph/low_dim_v15.hdf5 \
--local-dir data/robomimic/can/phUse these fields in your config for reliable loading:
data.format:robomimic_hdf5orlerobot_hfrobot.image_key: visual observation key used by policy conditioningrobot.image_keys: optional ordered list of visual keys for multi-camera conditioningrobot.state_keys: state vector keys used by all model paths (train/eval/deploy)data.lerobot_action_keyanddata.lerobot_episode_index_keyforlerobot_hf
Recommended key setup for Robosuite can (robotgeneralist/robosuite_can_ph):
robot.image_key='observation.images.right_wrist_0_rgb'robot.image_keys=['observation.images.right_wrist_0_rgb']robot.state_keys=['observation.state.eef_pos','observation.state.eef_quat','observation.state.tool','observation.state.object']
Notes:
- If
robot.state_keysis missing, pipeline falls back torobot.proprio_keys. - If
robot.image_keysis set, it overridesrobot.image_keyand preserves key order. - In
obs_mode=image, multiple cameras are fused side-by-side (same channels, wider image). - In
obs_mode=feature, per-camera features are concatenated, so feature dim scales by camera count. - Changing
image_key/image_keys/state_keyschanges model inputs, so old checkpoints become incompatible. - On macOS, prefer
data.lerobot_video_backend='pyav'. - In precomputed mode,
data.precomputed_features_pathshould point to the feature directory or archive produced byprecompute-vision.
Expected source schema:
robomimic_hdf5:/<data_group>/<demo_k>/actionsand/<data_group>/<demo_k>/obs/...lerobot_hf: flattened feature keys such asobservation.images.*,observation.state.*,action,episode_index
This is the recommended path for your current setup.
- Precompute wrist-camera vision features:
.venv/bin/python -u -m mini_pi0 precompute-vision \
--config examples/configs/robosuite_can_vision.yaml \
--set data.format=lerobot_hf \
--set data.lerobot_repo_id='robotgeneralist/robosuite_can_ph' \
--set data.lerobot_video_backend='pyav' \
--set robot.image_key='observation.images.right_wrist_0_rgb' \
--set robot.image_keys="['observation.images.right_wrist_0_rgb']" \
--set robot.state_keys="['observation.state.eef_pos','observation.state.eef_quat','observation.state.tool']" \
--vision_backend timm \
--vision_model_name 'vit_base_patch16_dinov3.lvd1689m' \
--vision_pretrained \
--precomputed_features_path data/features/can_wrist_dinov3_vitb16- Train policy on precomputed features:
.venv/bin/python -u -m mini_pi0 train \
--config examples/configs/robosuite_can_vision.yaml \
--observation_mode precomputed \
--precomputed_features_path data/features/can_wrist_dinov3_vitb16 \
--set model.obs_mode=feature \
--set train.val_ratio=0.1 \
--set train.ema_decay=0.999 \
--set train.checkpoint_use_ema=true \
--set train.device=auto2b) Resume training from a previous run checkpoint:
.venv/bin/python -u -m mini_pi0 train \
--config examples/configs/robosuite_can_vision.yaml \
--observation_mode precomputed \
--precomputed_features_path data/features/can_wrist_dinov3_vitb16 \
--resume_from runs/robosuite-can-fm-vision/run1/checkpoints/best.pt \
--resume_optimizer \
--set model.obs_mode=feature \
--set train.device=auto- Evaluate with verbose live logs:
.venv/bin/python -u -m mini_pi0 eval \
--config examples/configs/robosuite_can_vision.yaml \
--set eval.checkpoint='runs/robosuite-can-fm-vision/run1/checkpoints/best.pt' \
--set eval.action_stats_path='runs/robosuite-can-fm-vision/run1/artifacts/action_stats.json' \
--set eval.run_dir='runs/robosuite-can-fm-vision/run1' \
--set model.obs_mode=feature \
--set model.vision_dim=768 \
--set vision.use_runtime_extractor=true \
--set vision.model_name='vit_base_patch16_dinov3.lvd1689m' \
--verbose --log_every_episodes 1Minimal eval command:
.venv/bin/python -u -m mini_pi0 eval \
--config examples/configs/robosuite_can_vision.yaml \
--set eval.checkpoint='runs/robosuite-can-fm-vision/run1/checkpoints/best.pt' \
--set eval.action_stats_path='runs/robosuite-can-fm-vision/run1/artifacts/action_stats.json'Useful eval options:
--verbose --log_every_episodes 1for per-episode progress logs--strict_parity(default) to fail fast when checkpoint/runtime config mismatches--set eval.action_smoothing_alpha=0.2to smooth actions between replans--set eval.action_scale='[1,1,1,1,1,1,1]'for per-dimension action scaling--set eval.record_grid=trueto save success/failure 3x3 grid videos--set eval.max_steps=200to cap rollout horizon--set eval.run_dir='runs/<exp>/runN'to write metrics into an existing run
Metric interpretation:
success_rate: fraction of episodes that reached task successCI95: bootstrap 95% confidence interval over success rateepisode_len_mean/std: rollout length stats in environment stepsreward_mean: average accumulated rewardinfer_ms_mean: average model inference latency per predicted chunk
Output files:
metrics/eval_summary.json: scalar metricsmetrics/eval_arrays.json: per-episode raw arraysmetrics/eval_provenance.json: checkpoint/runtime parity report + diffmetrics/eval_config_requested.yaml: config requested by CLI before checkpoint injectionmetrics/eval_config_runtime.yaml: final runtime config after checkpoint model injectionartifacts/eval_metrics.png: plots for success trend, episode length, reward distributionartifacts/success_grid_*.mp4andartifacts/failure_grid_*.mp4when grid recording is enabled
Failure diagnostics in eval_summary.json:
failure_reason_counts: breakdown (no_progress,timeout_after_progress,drop_or_unstable, ...)action_clip_fraction_mean: percent of actions clipped by env boundsmax_step_reward_mean: best per-step reward reached on average
Run rollout hyperparameter sweeps without editing code:
.venv/bin/python -u -m mini_pi0 ablate-eval \
--config examples/configs/robosuite_can_vision.yaml \
--checkpoint runs/robosuite-can-fm-vision/run1/checkpoints/best.pt \
--action_stats runs/robosuite-can-fm-vision/run1/artifacts/action_stats.json \
--n_episodes 10 \
--execute_steps_values 1,2,4 \
--n_flow_steps_values 10,15,30 \
--smoothing_values 0.0,0.2Artifacts are saved under runs/<experiment>-ablation/runN/ with:
metrics/ablation_metrics.jsonlmetrics/ablation_summary.json
robot.state_keysis the preferred definition for policy state inputs.- If
robot.state_keysis not set, pipeline falls back torobot.proprio_keys. - If you change camera view or state keys, retrain the model for consistent behavior.
.venv/bin/python -m mini_pi0 vision-models
.venv/bin/python -m mini_pi0 vision-models --backend timm
.venv/bin/python -m mini_pi0 vision-models --backend timm --all_timmEach run is written as ordered folders:
runs/<experiment_name>/run1/
runs/<experiment_name>/run2/
...
Typical outputs:
checkpoints/best.ptartifacts/action_stats.jsonmetrics/train_summary.jsonmetrics/eval_summary.json
Hardware deploy helper remains separate:
.venv/bin/python deploy_so100.py ...
