Skip to content

Commit 4b59693

Browse files
authored
Add support for collecting human data with the RecordHuman PICO VR app (#126)
1 parent e00df3a commit 4b59693

File tree

3 files changed

+815
-29
lines changed

3 files changed

+815
-29
lines changed
Lines changed: 110 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,71 +1,152 @@
11
.. _human_demo:
22

3-
Converting Human Demos to LeRobot Format (Beta)
4-
===============================================
3+
Collecting Human Demos
4+
=======================
55

66
.. note::
7+
Make sure you have followed the :doc:`/installation` guide before proceeding.
78

8-
This feature is **still in beta**. The pose estimation methods for generating the state and actions will likely change in future OpenTau releases.
9+
OpenTau supports training VLAs on human demonstration data collected in LeRobot format. There are two ways to collect human demos:
910

11+
1. **RecordHuman VR app** (recommended) — record hand and head poses directly from a PICO VR headset with 3D tracking.
12+
2. **MediaPipe video conversion** — extract poses from ordinary MP4 videos using MediaPipe landmark detection.
1013

11-
This tutorial describes how to convert human demonstration videos into LeRobot-format datasets for training VLAs. The script uses MediaPipe for pose (third-person / exo) or hand (first-person / ego) landmark detection and writes frames, 3D landmarks as state, and next-step landmarks as action.
14+
.. note::
15+
RecordHuman is recommended because it captures a full 7-D pose (3D position + quaternion orientation) for every hand joint in camera space and tracks head movement through space, giving richer action representations.
16+
MediaPipe, by contrast, only provides 3D positions relative to the hand's own center, so it cannot capture how the hand moves through the scene.
17+
18+
.. _human_demo_vr:
19+
20+
Option 1: RecordHuman VR App (Recommended)
21+
------------------------------------------
22+
23+
The `RecordHuman <https://github.com/TensorAuto/RecordHuman>`_ Unity app runs on a PICO VR headset and records ego-perspective video together with hand and head pose data.
24+
25+
Step 1: Install RecordHuman on a PICO headset
26+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
27+
28+
Follow the setup instructions in the `RecordHuman README <https://github.com/TensorAuto/RecordHuman#readme>`_ to install the app on your PICO VR headset.
29+
30+
Step 2: Record demonstrations
31+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
32+
33+
Launch RecordHuman on the headset and perform the task you want to demonstrate. The app saves a video file and a JSON pose file for each recording.
34+
35+
Step 3: Convert to LeRobot format
36+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
37+
38+
Use ``recordhuman_to_lerobot.py`` to convert the recorded data into a LeRobot dataset that OpenTau can train on.
39+
40+
**Basic usage:**
41+
42+
.. code-block:: bash
43+
44+
python -m opentau.scripts.recordhuman_to_lerobot \
45+
--video recording.mp4 \
46+
--poses recording.json \
47+
--output ./datasets/my_vr_dataset \
48+
--prompt "Pick up the snack bag"
49+
50+
**Specify a target FPS** (e.g. 10 Hz). The overlay video, if requested, still uses the original video FPS:
51+
52+
.. code-block:: bash
1253
13-
Overview
14-
--------
54+
python -m opentau.scripts.recordhuman_to_lerobot \
55+
--video recording.mp4 \
56+
--poses recording.json \
57+
--output ./datasets/my_vr_dataset \
58+
--prompt "Pick up the snack bag" \
59+
--fps 10
60+
61+
**Generate a skeleton overlay video** for visual inspection:
62+
63+
.. code-block:: bash
64+
65+
python -m opentau.scripts.recordhuman_to_lerobot \
66+
--video recording.mp4 \
67+
--poses recording.json \
68+
--output ./datasets/my_vr_dataset \
69+
--prompt "Pick up the snack bag" \
70+
--overlay overlay.mp4
1571
16-
Run ``human_video_to_lerobot.py`` on one or more MP4s. Each video becomes one episode with:
72+
The conversion script produces a dataset with:
73+
74+
- Frames as ``observation.images.camera``
75+
- Camera-space hand joint poses (364-D) as ``observation.state``
76+
- Next-step hand state + delta head pose (371-D) as ``action``
77+
- The task prompt you provide
78+
79+
**Full list of options:**
80+
81+
.. code-block:: text
82+
83+
--video Path to input video file (required)
84+
--poses Path to JSON pose data file (required)
85+
--output LeRobot dataset output directory; must not exist (required)
86+
--prompt Task description for the episode (required)
87+
--overlay Write a skeleton overlay video to this path
88+
--fov Vertical FOV in degrees for projection, overlay only (default: 90)
89+
--time-offset Seconds added to video timestamps for pose alignment (default: 0)
90+
--tracking-origin XR origin offset subtracted from head_pos (default: -0.32 2.0276 0)
91+
--eye-offset Horizontal eye offset in meters (default: 0.06)
92+
--fps Output dataset FPS; defaults to the video's FPS
93+
--overlay-codec FourCC codec for the overlay video (default: mp4v)
94+
95+
96+
.. _human_demo_mediapipe:
97+
98+
Option 2: MediaPipe Video Conversion
99+
-------------------------------------
100+
101+
If you don't have a VR headset, you can convert ordinary MP4 videos of human demonstrations into LeRobot datasets. The ``human_video_to_lerobot.py`` script uses MediaPipe for pose (third-person / exo) or hand (first-person / ego) landmark detection and writes frames, 3D landmarks as state, and next-step landmarks as action.
102+
103+
Each video becomes one episode with:
17104

18105
- Frames as ``observation.images.camera``
19106
- 3D pose or hand landmarks as ``observation.state``
20107
- Next-step landmarks as ``action``
21108
- A task prompt you provide (e.g. "Pick up the cup")
22109

23-
Prerequisites
24-
------------
25-
26-
- OpenTau installed (see :doc:`/installation`).
27-
- One or more MP4 videos of human demonstrations (exo = full body in frame, ego = hand(s) in frame).
28-
29110
Converting videos
30-
-----------------
111+
^^^^^^^^^^^^^^^^^
31112

32113
From the project root, run the conversion script. The **output path is the LeRobot dataset root** and must not exist yet.
33114

34115
**Single video (exo — third-person pose):**
35116

36117
.. code-block:: bash
37118
38-
python -m opentau.scripts.human_video_to_lerobot \\
39-
/path/to/demo.mp4 \\
40-
./datasets/my_exo_dataset \\
119+
python -m opentau.scripts.human_video_to_lerobot \
120+
/path/to/demo.mp4 \
121+
./datasets/my_exo_dataset \
41122
--prompt "Pick up the red block"
42123
43124
**Single video (ego — first-person hands):**
44125

45126
.. code-block:: bash
46127
47-
python -m opentau.scripts.human_video_to_lerobot \\
48-
/path/to/ego_demo.mp4 \\
49-
./datasets/my_ego_dataset \\
50-
--prompt "Open the drawer" \\
128+
python -m opentau.scripts.human_video_to_lerobot \
129+
/path/to/ego_demo.mp4 \
130+
./datasets/my_ego_dataset \
131+
--prompt "Open the drawer" \
51132
--mode ego
52133
53134
**Use a specific FPS for the dataset** (e.g. 10 Hz). The overlay video (if requested) still uses the original video FPS:
54135

55136
.. code-block:: bash
56137
57-
python -m opentau.scripts.human_video_to_lerobot \\
58-
/path/to/demo.mp4 \\
59-
./datasets/my_dataset \\
60-
--prompt "Place the cup on the table" \\
138+
python -m opentau.scripts.human_video_to_lerobot \
139+
/path/to/demo.mp4 \
140+
./datasets/my_dataset \
141+
--prompt "Place the cup on the table" \
61142
--fps 10
62143
63144
**Save a landmark-overlay video** for inspection:
64145

65146
.. code-block:: bash
66147
67-
python -m opentau.scripts.human_video_to_lerobot \\
68-
/path/to/demo.mp4 \\
69-
./datasets/my_dataset \\
70-
--prompt "Pick up the cup" \\
148+
python -m opentau.scripts.human_video_to_lerobot \
149+
/path/to/demo.mp4 \
150+
./datasets/my_dataset \
151+
--prompt "Pick up the cup" \
71152
--overlay /path/to/overlay.mp4

pyproject.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -168,6 +168,8 @@ select = ["E4", "E7", "E9", "F", "I", "N", "B", "C4", "SIM"]
168168
[tool.ruff.lint.per-file-ignores]
169169
# Server must implement gRPC interface with PascalCase method names
170170
"src/opentau/scripts/grpc/server.py" = ["N802"]
171+
# Uppercase names for rotation matrices follow standard math convention
172+
"src/opentau/scripts/recordhuman_to_lerobot.py" = ["N803", "N806"]
171173

172174
[tool.bandit]
173175
exclude_dirs = [

0 commit comments

Comments
 (0)