|
1 | 1 | .. _human_demo: |
2 | 2 |
|
3 | | -Converting Human Demos to LeRobot Format (Beta) |
4 | | -=============================================== |
| 3 | +Collecting Human Demos |
| 4 | +======================= |
5 | 5 |
|
6 | 6 | .. note:: |
| 7 | + Make sure you have followed the :doc:`/installation` guide before proceeding. |
7 | 8 |
|
8 | | - This feature is **still in beta**. The pose estimation methods for generating the state and actions will likely change in future OpenTau releases. |
| 9 | +OpenTau supports training VLAs on human demonstration data collected in LeRobot format. There are two ways to collect human demos: |
9 | 10 |
|
| 11 | +1. **RecordHuman VR app** (recommended) — record hand and head poses directly from a PICO VR headset with 3D tracking. |
| 12 | +2. **MediaPipe video conversion** — extract poses from ordinary MP4 videos using MediaPipe landmark detection. |
10 | 13 |
|
11 | | -This tutorial describes how to convert human demonstration videos into LeRobot-format datasets for training VLAs. The script uses MediaPipe for pose (third-person / exo) or hand (first-person / ego) landmark detection and writes frames, 3D landmarks as state, and next-step landmarks as action. |
| 14 | +.. note:: |
| 15 | + RecordHuman is recommended because it captures a full 7-D pose (3D position + quaternion orientation) for every hand joint in camera space and tracks head movement through space, giving richer action representations. |
| 16 | + MediaPipe, by contrast, only provides 3D positions relative to the hand's own center, so it cannot capture how the hand moves through the scene. |
| 17 | + |
| 18 | +.. _human_demo_vr: |
| 19 | + |
| 20 | +Option 1: RecordHuman VR App (Recommended) |
| 21 | +------------------------------------------ |
| 22 | + |
| 23 | +The `RecordHuman <https://github.com/TensorAuto/RecordHuman>`_ Unity app runs on a PICO VR headset and records ego-perspective video together with hand and head pose data. |
| 24 | + |
| 25 | +Step 1: Install RecordHuman on a PICO headset |
| 26 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 27 | + |
| 28 | +Follow the setup instructions in the `RecordHuman README <https://github.com/TensorAuto/RecordHuman#readme>`_ to install the app on your PICO VR headset. |
| 29 | + |
| 30 | +Step 2: Record demonstrations |
| 31 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 32 | + |
| 33 | +Launch RecordHuman on the headset and perform the task you want to demonstrate. The app saves a video file and a JSON pose file for each recording. |
| 34 | + |
| 35 | +Step 3: Convert to LeRobot format |
| 36 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 37 | + |
| 38 | +Use ``recordhuman_to_lerobot.py`` to convert the recorded data into a LeRobot dataset that OpenTau can train on. |
| 39 | + |
| 40 | +**Basic usage:** |
| 41 | + |
| 42 | +.. code-block:: bash |
| 43 | +
|
| 44 | + python -m opentau.scripts.recordhuman_to_lerobot \ |
| 45 | + --video recording.mp4 \ |
| 46 | + --poses recording.json \ |
| 47 | + --output ./datasets/my_vr_dataset \ |
| 48 | + --prompt "Pick up the snack bag" |
| 49 | +
|
| 50 | +**Specify a target FPS** (e.g. 10 Hz). The overlay video, if requested, still uses the original video FPS: |
| 51 | + |
| 52 | +.. code-block:: bash |
12 | 53 |
|
13 | | -Overview |
14 | | --------- |
| 54 | + python -m opentau.scripts.recordhuman_to_lerobot \ |
| 55 | + --video recording.mp4 \ |
| 56 | + --poses recording.json \ |
| 57 | + --output ./datasets/my_vr_dataset \ |
| 58 | + --prompt "Pick up the snack bag" \ |
| 59 | + --fps 10 |
| 60 | +
|
| 61 | +**Generate a skeleton overlay video** for visual inspection: |
| 62 | + |
| 63 | +.. code-block:: bash |
| 64 | +
|
| 65 | + python -m opentau.scripts.recordhuman_to_lerobot \ |
| 66 | + --video recording.mp4 \ |
| 67 | + --poses recording.json \ |
| 68 | + --output ./datasets/my_vr_dataset \ |
| 69 | + --prompt "Pick up the snack bag" \ |
| 70 | + --overlay overlay.mp4 |
15 | 71 |
|
16 | | -Run ``human_video_to_lerobot.py`` on one or more MP4s. Each video becomes one episode with: |
| 72 | +The conversion script produces a dataset with: |
| 73 | + |
| 74 | +- Frames as ``observation.images.camera`` |
| 75 | +- Camera-space hand joint poses (364-D) as ``observation.state`` |
| 76 | +- Next-step hand state + delta head pose (371-D) as ``action`` |
| 77 | +- The task prompt you provide |
| 78 | + |
| 79 | +**Full list of options:** |
| 80 | + |
| 81 | +.. code-block:: text |
| 82 | +
|
| 83 | + --video Path to input video file (required) |
| 84 | + --poses Path to JSON pose data file (required) |
| 85 | + --output LeRobot dataset output directory; must not exist (required) |
| 86 | + --prompt Task description for the episode (required) |
| 87 | + --overlay Write a skeleton overlay video to this path |
| 88 | + --fov Vertical FOV in degrees for projection, overlay only (default: 90) |
| 89 | + --time-offset Seconds added to video timestamps for pose alignment (default: 0) |
| 90 | + --tracking-origin XR origin offset subtracted from head_pos (default: -0.32 2.0276 0) |
| 91 | + --eye-offset Horizontal eye offset in meters (default: 0.06) |
| 92 | + --fps Output dataset FPS; defaults to the video's FPS |
| 93 | + --overlay-codec FourCC codec for the overlay video (default: mp4v) |
| 94 | +
|
| 95 | +
|
| 96 | +.. _human_demo_mediapipe: |
| 97 | + |
| 98 | +Option 2: MediaPipe Video Conversion |
| 99 | +------------------------------------- |
| 100 | + |
| 101 | +If you don't have a VR headset, you can convert ordinary MP4 videos of human demonstrations into LeRobot datasets. The ``human_video_to_lerobot.py`` script uses MediaPipe for pose (third-person / exo) or hand (first-person / ego) landmark detection and writes frames, 3D landmarks as state, and next-step landmarks as action. |
| 102 | + |
| 103 | +Each video becomes one episode with: |
17 | 104 |
|
18 | 105 | - Frames as ``observation.images.camera`` |
19 | 106 | - 3D pose or hand landmarks as ``observation.state`` |
20 | 107 | - Next-step landmarks as ``action`` |
21 | 108 | - A task prompt you provide (e.g. "Pick up the cup") |
22 | 109 |
|
23 | | -Prerequisites |
24 | | ------------- |
25 | | - |
26 | | -- OpenTau installed (see :doc:`/installation`). |
27 | | -- One or more MP4 videos of human demonstrations (exo = full body in frame, ego = hand(s) in frame). |
28 | | - |
29 | 110 | Converting videos |
30 | | ------------------ |
| 111 | +^^^^^^^^^^^^^^^^^ |
31 | 112 |
|
32 | 113 | From the project root, run the conversion script. The **output path is the LeRobot dataset root** and must not exist yet. |
33 | 114 |
|
34 | 115 | **Single video (exo — third-person pose):** |
35 | 116 |
|
36 | 117 | .. code-block:: bash |
37 | 118 |
|
38 | | - python -m opentau.scripts.human_video_to_lerobot \\ |
39 | | - /path/to/demo.mp4 \\ |
40 | | - ./datasets/my_exo_dataset \\ |
| 119 | + python -m opentau.scripts.human_video_to_lerobot \ |
| 120 | + /path/to/demo.mp4 \ |
| 121 | + ./datasets/my_exo_dataset \ |
41 | 122 | --prompt "Pick up the red block" |
42 | 123 |
|
43 | 124 | **Single video (ego — first-person hands):** |
44 | 125 |
|
45 | 126 | .. code-block:: bash |
46 | 127 |
|
47 | | - python -m opentau.scripts.human_video_to_lerobot \\ |
48 | | - /path/to/ego_demo.mp4 \\ |
49 | | - ./datasets/my_ego_dataset \\ |
50 | | - --prompt "Open the drawer" \\ |
| 128 | + python -m opentau.scripts.human_video_to_lerobot \ |
| 129 | + /path/to/ego_demo.mp4 \ |
| 130 | + ./datasets/my_ego_dataset \ |
| 131 | + --prompt "Open the drawer" \ |
51 | 132 | --mode ego |
52 | 133 |
|
53 | 134 | **Use a specific FPS for the dataset** (e.g. 10 Hz). The overlay video (if requested) still uses the original video FPS: |
54 | 135 |
|
55 | 136 | .. code-block:: bash |
56 | 137 |
|
57 | | - python -m opentau.scripts.human_video_to_lerobot \\ |
58 | | - /path/to/demo.mp4 \\ |
59 | | - ./datasets/my_dataset \\ |
60 | | - --prompt "Place the cup on the table" \\ |
| 138 | + python -m opentau.scripts.human_video_to_lerobot \ |
| 139 | + /path/to/demo.mp4 \ |
| 140 | + ./datasets/my_dataset \ |
| 141 | + --prompt "Place the cup on the table" \ |
61 | 142 | --fps 10 |
62 | 143 |
|
63 | 144 | **Save a landmark-overlay video** for inspection: |
64 | 145 |
|
65 | 146 | .. code-block:: bash |
66 | 147 |
|
67 | | - python -m opentau.scripts.human_video_to_lerobot \\ |
68 | | - /path/to/demo.mp4 \\ |
69 | | - ./datasets/my_dataset \\ |
70 | | - --prompt "Pick up the cup" \\ |
| 148 | + python -m opentau.scripts.human_video_to_lerobot \ |
| 149 | + /path/to/demo.mp4 \ |
| 150 | + ./datasets/my_dataset \ |
| 151 | + --prompt "Pick up the cup" \ |
71 | 152 | --overlay /path/to/overlay.mp4 |
0 commit comments