Skip to content

Commit 7f30ecf

Browse files
committed
Adding fixes for dataset docs
1 parent 067ceaf commit 7f30ecf

File tree

14 files changed

+335
-182
lines changed

14 files changed

+335
-182
lines changed

docs/source/model.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ This is the documentation for the supported models in OpenTau.
66
pi05
77
----
88
- Pi05 is a state of the art Vision-language-action flow model for general robot control. It supports both autoregressive discrete actions and flow matching continuous actions.
9-
- More details can be found in the `paper <https://www.pi.website/download/pi05.pdf>`_.
9+
- More details can be found in the `pi05 paper <https://www.pi.website/download/pi05.pdf>`_.
1010
- See the implementation in `src/opentau/policies/pi05/modeling_pi05.py`.
1111
- Checkpoint of the model finetuned on the LIBERO dataset is available on Hugging Face: `TensorAuto/tPi05-Libero <https://huggingface.co/TensorAuto/tPi05-Libero>`_
1212
- Disclaimer: Our implementation doesn't support sub-task prediction yet, as mentioned in the paper.
@@ -15,13 +15,13 @@ pi05
1515
pi0
1616
----
1717
- Pi0 is a Vision-language-action flow model that only supports flow matching continuous actions.
18-
- More details can be found in the `paper <https://www.pi.website/download/pi0.pdf>`_.
18+
- More details can be found in the `pi0 paper <https://www.pi.website/download/pi0.pdf>`_.
1919
- See the implementation in `src/opentau/policies/pi0/modeling_pi0.py`.
2020
- This model can be changed to pi0-star by changing the `advantage_always_on` flag to `on`/'use' in the config file.
2121
- Checkpoint of the model finetuned on the LIBERO dataset is available on Hugging Face: `TensorAuto/tPi0-Libero <https://huggingface.co/TensorAuto/tPi0-Libero>`_
2222

2323
value
2424
-----
2525
- Value model is a Vision-language model used to predict the value of the current state. Its used to train VLA policies with RECAP framework.
26-
- More details can be found in the `paper <https://www.pi.website/download/pistar06.pdf>`_.
26+
- More details can be found in the `pi*06 paper <https://www.pi.website/download/pistar06.pdf>`_.
2727
- See the implementation in `src/opentau/policies/value/modeling_value.py`.

docs/source/tutorials/datasets.rst

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ Building a dataset mixture
99

1010
You can define a dataset mixture in your configuration file using the ``dataset_mixture`` key. Here is an example:
1111

12-
.. code-block:: json
12+
.. code-block:: javascript
1313
1414
{
1515
"dataset_mixture": {
@@ -30,21 +30,21 @@ You can define a dataset mixture in your configuration file using the ``dataset_
3030
...
3131
}
3232
33-
For each new dataset, you must add an entry to ``opentau/datasets/standard_data_format_mapping.py`` to map the dataset features to the Standard Data Format (see the :ref:`Standard Data Format section <concepts/standard-data-format>` in the Concepts documentation).
33+
For each new dataset, you must add an entry to ``opentau/datasets/standard_data_format_mapping.py`` to map the dataset features to the Standard Data Format.
3434
Alternatively, you can provide a custom mapping in the dataset config using the ``data_features_name_mapping`` and ``loss_type_mapping`` keys.
3535
For example:
3636

37-
.. code-block:: json
37+
.. code-block:: javascript
3838
3939
{
4040
"dataset_mixture": {
4141
"datasets": [
4242
{
43-
"repo_id": "physical-intelligence/libero"
43+
"repo_id": "physical-intelligence/libero",
4444
"data_features_name_mapping": {
4545
"camera0": "observation.images.exterior_image_1_left",
46-
"camera1": "observation.images.exterior_image_2_left",
47-
}
46+
"camera1": "observation.images.exterior_image_2_left"
47+
},
4848
"loss_type_mapping": "MSE"
4949
},
5050
{

src/opentau/datasets/__init__.py

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,3 +11,74 @@
1111
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
1212
# See the License for the specific language governing permissions and
1313
# limitations under the License.
14+
"""Dataset management and processing utilities for robot learning and vision-language tasks.
15+
16+
This module provides a comprehensive toolkit for loading, creating, managing, and
17+
processing datasets for training vision-language-action (VLA) models. It supports
18+
both robot learning datasets (with actions and states) and vision-language
19+
grounding datasets (for multimodal understanding tasks).
20+
21+
The module is organized into several key components:
22+
23+
- **Core Datasets**: LeRobotDataset for robot learning data with support for
24+
temporal alignment, multi-modal data, and version compatibility.
25+
- **Grounding Datasets**: Vision-language datasets (CLEVR, COCO-QA, PIXMO, VSR)
26+
for training visual understanding without robot actions.
27+
- **Dataset Mixtures**: WeightedDatasetMixture for combining multiple datasets
28+
with controlled sampling proportions.
29+
- **Data Processing**: Utilities for statistics computation, image/video
30+
handling, transforms, and format standardization.
31+
- **Factory Functions**: High-level functions for creating datasets and mixtures
32+
from configuration objects.
33+
34+
Key Features:
35+
36+
- **HuggingFace Integration**: Seamless loading from HuggingFace Hub with
37+
automatic version checking and backward compatibility.
38+
- **Temporal Alignment**: Delta timestamps enable sampling features at
39+
different time offsets with optional Gaussian noise for data augmentation.
40+
- **Multi-modal Support**: Handles images, videos, state vectors, actions,
41+
and text prompts with automatic format conversion.
42+
- **Weighted Sampling**: Combine heterogeneous datasets with configurable
43+
sampling weights for balanced training.
44+
- **Standard Data Format**: Unified data format across all datasets for
45+
consistent model input/output interfaces.
46+
- **Statistics Management**: Automatic computation and aggregation of dataset
47+
statistics for normalization.
48+
- **Video Handling**: Multiple video backends (torchcodec, pyav, video_reader)
49+
for efficient frame extraction and encoding.
50+
- **Asynchronous I/O**: High-performance image writing for real-time data
51+
recording without blocking.
52+
53+
Main Modules:
54+
55+
- **lerobot_dataset**: Core dataset implementation for robot learning data.
56+
- **grounding**: Vision-language grounding datasets (CLEVR, COCO-QA, PIXMO, VSR).
57+
- **dataset_mixture**: Weighted combination of multiple datasets.
58+
- **factory**: Factory functions for creating datasets from configurations.
59+
- **utils**: Utility functions for I/O, metadata management, and validation.
60+
- **compute_stats**: Statistics computation and aggregation utilities.
61+
- **transforms**: Image transformation pipelines for data augmentation.
62+
- **video_utils**: Video encoding, decoding, and metadata extraction.
63+
- **image_writer**: Asynchronous image writing for high-frequency recording.
64+
- **sampler**: Episode-aware sampling with boundary frame filtering.
65+
- **standard_data_format_mapping**: Feature name and loss type mappings.
66+
67+
Example:
68+
Create a dataset mixture from configuration:
69+
70+
>>> from opentau.datasets.factory import make_dataset_mixture
71+
>>> mixture = make_dataset_mixture(train_cfg)
72+
>>> dataloader = mixture.get_dataloader()
73+
74+
Load a single dataset:
75+
76+
>>> from opentau.datasets.factory import make_dataset
77+
>>> dataset = make_dataset(dataset_cfg, train_cfg)
78+
79+
Access grounding datasets:
80+
81+
>>> from opentau import available_grounding_datasets
82+
>>> print(list(available_grounding_datasets.keys()))
83+
['clevr', 'cocoqa', 'dummy', 'pixmo', 'vsr']
84+
"""

src/opentau/datasets/compute_stats.py

Lines changed: 16 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -42,16 +42,22 @@
4242
weighted variance across multiple statistics.
4343
4444
Functions:
45-
estimate_num_samples: Heuristic to estimate optimal number of samples
46-
based on dataset size.
47-
sample_indices: Generate evenly spaced sample indices from a dataset.
48-
auto_downsample_height_width: Automatically downsample large images.
49-
sample_images: Load and downsample a subset of images from file paths.
50-
get_feature_stats: Compute statistical measures for an array.
51-
compute_episode_stats: Compute statistics for a single episode.
52-
aggregate_feature_stats: Aggregate statistics for a feature across
53-
multiple episodes.
54-
aggregate_stats: Aggregate statistics from multiple episodes/datasets.
45+
estimate_num_samples
46+
Heuristic to estimate optimal number of samples based on dataset size.
47+
sample_indices
48+
Generate evenly spaced sample indices from a dataset.
49+
auto_downsample_height_width
50+
Automatically downsample large images.
51+
sample_images
52+
Load and downsample a subset of images from file paths.
53+
get_feature_stats
54+
Compute statistical measures for an array.
55+
compute_episode_stats
56+
Compute statistics for a single episode.
57+
aggregate_feature_stats
58+
Aggregate statistics for a feature across multiple episodes.
59+
aggregate_stats
60+
Aggregate statistics from multiple episodes/datasets.
5561
5662
Example:
5763
Compute statistics for a single episode:

src/opentau/datasets/factory.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -101,8 +101,10 @@ def resolve_delta_timestamps(
101101
102102
Returns:
103103
A 2-tuple containing:
104+
104105
- At index 0, a 4-tuple containing delta timestamps mean, std, lower, and upper bounds for each group.
105106
- At index 1, a dictionary mapping feature names to their corresponding group and index.
107+
106108
The delta timestamps and group mapping should follow the structure expected by LeRobotDataset.
107109
"""
108110
group = "input_group"

src/opentau/datasets/grounding/pixmo.py

Lines changed: 7 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -11,27 +11,18 @@
1111
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
1212
# See the License for the specific language governing permissions and
1313
# limitations under the License.
14-
15-
"""
16-
Datasets for Image-Text Point Set grounding tasks.
14+
"""Datasets for Image-Text Point Set grounding tasks.
1715
1816
This module provides the PIXMO (Pixel-level Manipulation) dataset implementation
19-
for training vision-language models on part localization and object grounding
20-
tasks. The dataset contains images with point annotations for object parts,
21-
enabling models to learn fine-grained spatial understanding.
17+
for training vision-language models on part localization and object grounding tasks.
18+
19+
The dataset contains images with point annotations for object parts, enabling models
20+
to learn fine-grained spatial understanding.
2221
2322
The dataset is loaded from HuggingFace (allenai/pixmo-points) and includes
2423
automatic retry logic for handling image download failures. Point coordinates
2524
are normalized to a 255x255 grid and formatted as JSON strings in the postfix.
2625
27-
Key Features:
28-
- Point set grounding: Provides pixel-level point annotations for object
29-
parts with labels.
30-
- Robust loading: Automatic retry with random sampling for failed image
31-
downloads.
32-
- Grid normalization: Converts pixel coordinates to normalized grid space
33-
for consistent representation.
34-
3526
Classes:
3627
PixmoDataset: Dataset class that loads and formats PIXMO data for part
3728
localization tasks.
@@ -49,7 +40,8 @@
4940
HTTP_TIMEOUT: HTTP request timeout in seconds.
5041
5142
Example:
52-
Use PIXMO dataset in training:
43+
Use PIXMO dataset in training::
44+
5345
>>> from opentau.configs.default import DatasetConfig
5446
>>> cfg = DatasetConfig(grounding="pixmo")
5547
>>> dataset = make_dataset(cfg, train_cfg)

src/opentau/datasets/grounding/vsr.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,6 @@
1111
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
1212
# See the License for the specific language governing permissions and
1313
# limitations under the License.
14-
1514
"""VSR (Visual Spatial Reasoning) dataset for true/false statement grounding.
1615
1716
This module provides the VSR dataset implementation for training vision-language
@@ -24,10 +23,10 @@
2423
formatted as grounding tasks with true/false labels.
2524
2625
Key Features:
27-
- Spatial reasoning: Tests understanding of spatial relationships between
26+
* Spatial reasoning: Tests understanding of spatial relationships between
2827
objects in images.
29-
- Binary classification: Simple true/false format for clear learning signal.
30-
- Robust loading: Automatic retry with random sampling for failed image
28+
* Binary classification: Simple true/false format for clear learning signal.
29+
* Robust loading: Automatic retry with random sampling for failed image
3130
downloads.
3231
3332
Classes:
@@ -44,7 +43,8 @@
4443
HTTP_TIMEOUT: HTTP request timeout in seconds.
4544
4645
Example:
47-
Use VSR dataset in training:
46+
Use VSR dataset in training::
47+
4848
>>> from opentau.configs.default import DatasetConfig
4949
>>> cfg = DatasetConfig(grounding="vsr")
5050
>>> dataset = make_dataset(cfg, train_cfg)

src/opentau/datasets/image_writer.py

Lines changed: 15 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
robots and recording data at high frame rates without blocking the main process.
2222
2323
The module supports two execution models:
24+
2425
1. Threading mode (num_processes=0): Creates a pool of worker threads
2526
for concurrent image writing within a single process.
2627
2. Multiprocessing mode (num_processes>0): Creates multiple processes,
@@ -39,18 +40,22 @@
3940
even when exceptions occur.
4041
4142
Classes:
42-
AsyncImageWriter: Main class for asynchronous image writing with
43-
configurable threading or multiprocessing backends.
43+
44+
AsyncImageWriter
45+
Main class for asynchronous image writing with configurable threading
46+
or multiprocessing backends.
4447
4548
Functions:
46-
image_array_to_pil_image: Convert numpy array to PIL Image with format
47-
and type conversion.
48-
write_image: Write an image (numpy array or PIL Image) to disk.
49-
worker_thread_loop: Worker thread loop for processing image write queue.
50-
worker_process: Worker process that manages multiple threads for image
51-
writing.
52-
safe_stop_image_writer: Decorator to safely stop image writer on
53-
exceptions.
49+
image_array_to_pil_image
50+
Convert numpy array to PIL Image with format and type conversion.
51+
write_image
52+
Write an image (numpy array or PIL Image) to disk.
53+
worker_thread_loop
54+
Worker thread loop for processing image write queue.
55+
worker_process
56+
Worker process that manages multiple threads for image writing.
57+
safe_stop_image_writer
58+
Decorator to safely stop image writer on exceptions.
5459
5560
Example:
5661
Create an async image writer with threading:

0 commit comments

Comments
 (0)