Skip to content

Commit e00df3a

Browse files
authored
Rename grounding to vqa (#124)
1 parent 9c88884 commit e00df3a

File tree

19 files changed

+141
-142
lines changed

19 files changed

+141
-142
lines changed

configs/dev/ci_config.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
"dataset_mixture": {
33
"datasets": [
44
{
5-
"grounding": "dummy"
5+
"vqa": "dummy"
66
},
77
{
88
"repo_id": "lerobot/droid_100",

configs/examples/value_config.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
"repo_id": "physical-intelligence/libero"
66
},
77
{
8-
"grounding": "clevr"
8+
"vqa": "clevr"
99
}
1010
],
1111
"weights": [

docs/source/concepts.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ The dataset format is versioned (currently v2.1) and utilizes parquet files for
1616
There are currently two types of datasets:
1717

1818
* ``LeRobotDataset``: For robotic data.
19-
* ``GroundingDataset``: For VLM datasets such as Visual Question Answering (VQA) or visual grounding.
19+
* ``VQADataset``: For VLM training datasets.
2020

2121
These datasets are used to train policies.
2222

@@ -25,7 +25,7 @@ DatasetMixture
2525
To train policies on multiple datasets simultaneously, OpenTau uses ``opentau.datasets.dataset_mixture.WeightedDatasetMixture``.
2626
This class:
2727

28-
* Combines multiple ``LeRobotDataset`` and ``GroundingDataset`` instances.
28+
* Combines multiple ``LeRobotDataset`` and ``VQADataset`` instances.
2929
* Different weights can be assigned to each dataset to control the sampling frequency.
3030
* Aggregates statistics from all constituent datasets to ensure consistent normalization across the mixture.
3131
* Resamples the action output frequency to match the action frequency specified in the configuration.

src/opentau/__init__.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@
2525
- `available_tasks_per_env`: Mapping of environments to their available tasks
2626
- `available_datasets_per_env`: Mapping of environments to their compatible datasets
2727
- `available_real_world_datasets`: List of real-world robot datasets
28-
- `available_grounding_datasets`: Registry for grounding datasets (populated via decorator)
28+
- `available_vqa_datasets`: Registry for vqa datasets (populated via decorator)
2929
- `available_policies`: List of available policy types (e.g., "pi0", "pi05", "value")
3030
- `available_policies_per_env`: Mapping of environments to their compatible policies
3131
@@ -142,7 +142,7 @@
142142
"lerobot/usc_cloth_sim",
143143
]
144144

145-
available_grounding_datasets = {}
145+
available_vqa_datasets = {}
146146

147147
available_datasets = sorted(
148148
set(itertools.chain(*available_datasets_per_env.values(), available_real_world_datasets))
@@ -177,4 +177,4 @@ def decorator(cls):
177177
return register
178178

179179

180-
register_grounding_dataset = registry_factory(available_grounding_datasets)
180+
register_vqa_dataset = registry_factory(available_vqa_datasets)

src/opentau/configs/default.py

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -53,9 +53,9 @@ class DatasetConfig:
5353
5454
Args:
5555
repo_id: HuggingFace repository ID for the dataset. Exactly one of
56-
`repo_id` or `grounding` must be set.
57-
grounding: Grounding dataset identifier. Exactly one of `repo_id` or
58-
`grounding` must be set.
56+
`repo_id` or `vqa` must be set.
57+
vqa: VQA dataset identifier. Exactly one of `repo_id` or
58+
`vqa` must be set.
5959
root: Root directory where the dataset will be stored (e.g. 'dataset/path').
6060
Defaults to None.
6161
episodes: List of episode indices to use from the dataset. If None, all
@@ -72,13 +72,13 @@ class DatasetConfig:
7272
standard feature names. Defaults to None.
7373
7474
Raises:
75-
ValueError: If both or neither of `repo_id` and `grounding` are set, or
75+
ValueError: If both or neither of `repo_id` and `vqa` are set, or
7676
if `data_features_name_mapping` is provided.
7777
is provided.
7878
"""
7979

8080
repo_id: str | None = None
81-
grounding: str | None = None
81+
vqa: str | None = None
8282
# Root directory where the dataset will be stored (e.g. 'dataset/path').
8383
root: str | None = None
8484
episodes: list[int] | None = None
@@ -98,8 +98,8 @@ class DatasetConfig:
9898

9999
def __post_init__(self):
100100
"""Validate dataset configuration and register custom mappings if provided."""
101-
if (self.repo_id is None) == (self.grounding is None):
102-
raise ValueError("Exactly one of `repo_id` or `grounding` for Dataset config should be set.")
101+
if (self.repo_id is None) == (self.vqa is None):
102+
raise ValueError("Exactly one of `repo_id` or `vqa` for Dataset config should be set.")
103103

104104
# data_features_name_mapping have to be provided if it is not already in standard_data_format_mapping.py
105105

src/opentau/datasets/__init__.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -16,13 +16,13 @@
1616
This module provides a comprehensive toolkit for loading, creating, managing, and
1717
processing datasets for training vision-language-action (VLA) models. It supports
1818
both robot learning datasets (with actions and states) and vision-language
19-
grounding datasets (for multimodal understanding tasks).
19+
vqa datasets (for multimodal understanding tasks).
2020
2121
The module is organized into several key components:
2222
2323
- **Core Datasets**: LeRobotDataset for robot learning data with support for
2424
temporal alignment, multi-modal data, and version compatibility.
25-
- **Grounding Datasets**: Vision-language datasets (CLEVR, COCO-QA, PIXMO, VSR)
25+
- **VQA Datasets**: Vision-language datasets (CLEVR, COCO-QA, PIXMO, VSR)
2626
for training visual understanding without robot actions.
2727
- **Dataset Mixtures**: WeightedDatasetMixture for combining multiple datasets
2828
with controlled sampling proportions.
@@ -53,7 +53,7 @@
5353
Main Modules:
5454
5555
- **lerobot_dataset**: Core dataset implementation for robot learning data.
56-
- **grounding**: Vision-language grounding datasets (CLEVR, COCO-QA, PIXMO, VSR).
56+
- **vqa**: Vision-language vqa datasets (CLEVR, COCO-QA, PIXMO, VSR).
5757
- **dataset_mixture**: Weighted combination of multiple datasets.
5858
- **factory**: Factory functions for creating datasets from configurations.
5959
- **utils**: Utility functions for I/O, metadata management, and validation.
@@ -76,9 +76,9 @@
7676
>>> from opentau.datasets.factory import make_dataset
7777
>>> dataset = make_dataset(dataset_cfg, train_cfg)
7878
79-
Access grounding datasets:
79+
Access vqa datasets:
8080
81-
>>> from opentau import available_grounding_datasets
82-
>>> print(list(available_grounding_datasets.keys()))
81+
>>> from opentau import available_vqa_datasets
82+
>>> print(list(available_vqa_datasets.keys()))
8383
['clevr', 'cocoqa', 'dummy', 'pixmo', 'vsr']
8484
"""

src/opentau/datasets/factory.py

Lines changed: 17 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@
2525
The factory supports two types of datasets:
2626
1. LeRobot datasets: Standard robot learning datasets loaded from HuggingFace
2727
repositories with configurable delta timestamps for temporal alignment.
28-
2. Grounding datasets: Vision-language grounding datasets (CLEVR, COCO-QA,
28+
2. VQA datasets: Vision-language vqa datasets (CLEVR, COCO-QA,
2929
PIXMO, VSR, etc.) for multimodal learning tasks.
3030
3131
Key Features:
@@ -36,7 +36,7 @@
3636
during dataset creation.
3737
- Imagenet stats override: Optionally replaces dataset statistics with
3838
ImageNet normalization statistics for camera features.
39-
- Grounding dataset registration: Supports extensible grounding dataset
39+
- VQA dataset registration: Supports extensible vqa dataset
4040
registration through side-effect imports.
4141
4242
Functions:
@@ -68,12 +68,12 @@
6868
import torch
6969

7070
# NOTE: Don't delete; imported for side effects.
71-
import opentau.datasets.grounding.clevr # noqa: F401
72-
import opentau.datasets.grounding.cocoqa # noqa: F401
73-
import opentau.datasets.grounding.dummy # noqa: F401
74-
import opentau.datasets.grounding.pixmo # noqa: F401
75-
import opentau.datasets.grounding.vsr # noqa: F401
76-
from opentau import available_grounding_datasets
71+
import opentau.datasets.vqa.clevr # noqa: F401
72+
import opentau.datasets.vqa.cocoqa # noqa: F401
73+
import opentau.datasets.vqa.dummy # noqa: F401
74+
import opentau.datasets.vqa.pixmo # noqa: F401
75+
import opentau.datasets.vqa.vsr # noqa: F401
76+
from opentau import available_vqa_datasets
7777
from opentau.configs.default import DatasetConfig
7878
from opentau.configs.train import TrainPipelineConfig
7979
from opentau.datasets.dataset_mixture import WeightedDatasetMixture
@@ -169,23 +169,22 @@ def make_dataset(
169169
"episode_end_idx", "current_idx", "last_step", "episode_index", and "timestamp". Defaults to False.
170170
171171
Raises:
172-
ValueError: If exactly one of `cfg.grounding` and `cfg.repo_id` is not provided.
173-
ValueError: If `cfg.grounding` is not a supported grounding dataset.
172+
ValueError: If exactly one of `cfg.vqa` and `cfg.repo_id` is not provided.
173+
ValueError: If `cfg.vqa` is not a supported vqa dataset.
174174
175175
Returns:
176176
BaseDataset or Tuple[BaseDataset, BaseDataset]: A single dataset or a tuple of (train_dataset, val_dataset) if val_freq > 0.
177177
"""
178178
image_transforms = ImageTransforms(cfg.image_transforms) if cfg.image_transforms.enable else None
179179

180-
if isinstance(cfg.grounding, str) + isinstance(cfg.repo_id, str) != 1:
181-
raise ValueError("Exactly one of `cfg.grounding` and `cfg.repo_id` should be provided.")
180+
if isinstance(cfg.vqa, str) + isinstance(cfg.repo_id, str) != 1:
181+
raise ValueError("Exactly one of `cfg.vqa` and `cfg.repo_id` should be provided.")
182182

183-
if isinstance(cfg.grounding, str):
184-
ds_cls = available_grounding_datasets.get(cfg.grounding)
183+
if isinstance(cfg.vqa, str):
184+
ds_cls = available_vqa_datasets.get(cfg.vqa)
185185
if ds_cls is None:
186186
raise ValueError(
187-
f"Unknown grounding dataset '{cfg.grounding}'. "
188-
f"Supported datasets are: {available_grounding_datasets.keys()}"
187+
f"Unknown vqa dataset '{cfg.vqa}'. Supported datasets are: {available_vqa_datasets.keys()}"
189188
)
190189
# TODO support dataset-specific arg / kwargs
191190
dataset = ds_cls(train_cfg)
@@ -210,8 +209,8 @@ def make_dataset(
210209
return_advantage_input=return_advantage_input,
211210
)
212211

213-
# TODO grounding datasets implement stats in original feature names, but camera_keys are standardized names
214-
if not isinstance(cfg.grounding, str) and "dummy" not in cfg.repo_id and cfg.use_imagenet_stats:
212+
# TODO vqa datasets implement stats in original feature names, but camera_keys are standardized names
213+
if not isinstance(cfg.vqa, str) and "dummy" not in cfg.repo_id and cfg.use_imagenet_stats:
215214
for key in dataset.meta.camera_keys:
216215
for stats_type, stats in IMAGENET_STATS.items():
217216
if key not in dataset.meta.stats:

src/opentau/datasets/lerobot_dataset.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -52,8 +52,8 @@
5252
Metadata manager for LeRobot datasets with Hub integration, version
5353
checking, and statistics loading.
5454
55-
GroundingDatasetMetadata
56-
Metadata manager for grounding datasets.
55+
VQADatasetMetadata
56+
Metadata manager for vqa datasets.
5757
5858
BaseDataset
5959
Base PyTorch Dataset class with common functionality.
@@ -259,8 +259,8 @@ def shapes(self) -> dict:
259259
return {key: tuple(ft["shape"]) for key, ft in self.features.items()}
260260

261261

262-
class GroundingDatasetMetadata(DatasetMetadata):
263-
"""Metadata class for grounding datasets (vision-language datasets)."""
262+
class VQADatasetMetadata(DatasetMetadata):
263+
"""Metadata class for vqa datasets (vision-language datasets)."""
264264

265265
pass
266266

@@ -585,7 +585,7 @@ class BaseDataset(torch.utils.data.Dataset):
585585
"""Base class for all robot learning datasets.
586586
587587
This abstract base class provides common functionality for both LeRobotDataset
588-
and GroundingDataset, including data format standardization, image processing,
588+
and VQADataset, including data format standardization, image processing,
589589
and vector padding. It ensures all datasets conform to a standard format
590590
regardless of their source or structure.
591591

src/opentau/datasets/grounding/__init__.py renamed to src/opentau/datasets/vqa/__init__.py

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -12,56 +12,56 @@
1212
# See the License for the specific language governing permissions and
1313
# limitations under the License.
1414

15-
"""Vision-language grounding datasets for multimodal learning.
15+
"""Vision-language vqa datasets for multimodal learning.
1616
1717
This module provides datasets for training vision-language-action models on
18-
image-text grounding tasks without requiring robot actions. Grounding datasets
18+
image-text vqa tasks without requiring robot actions. VQA datasets
1919
are designed to help models learn visual understanding, spatial reasoning,
20-
and language grounding capabilities that can be transferred to robotic tasks.
20+
and language vqa capabilities that can be transferred to robotic tasks.
2121
22-
Grounding datasets differ from standard robot learning datasets in that they:
22+
VQA datasets differ from standard robot learning datasets in that they:
2323
- Provide images, prompts, and responses but no robot actions or states
2424
- Use zero-padding for state and action features to maintain compatibility
25-
- Focus on visual question answering, spatial reasoning, and object grounding
25+
- Focus on visual question answering, spatial reasoning, and object vqa
2626
- Enable training on large-scale vision-language data without robot hardware
2727
2828
The module uses a registration system where datasets are registered via the
29-
`@register_grounding_dataset` decorator, making them available through the
30-
`available_grounding_datasets` registry.
29+
`@register_vqa_dataset` decorator, making them available through the
30+
`available_vqa_datasets` registry.
3131
3232
Available Datasets:
3333
- CLEVR: Compositional Language and Elementary Visual Reasoning dataset
3434
for visual question answering with synthetic scenes.
3535
- COCO-QA: Visual question answering dataset based on COCO images,
3636
filtered for spatial reasoning tasks.
37-
- PIXMO: Pixel-level manipulation grounding dataset for object
37+
- PIXMO: Pixel-level manipulation vqa dataset for object
3838
localization and manipulation tasks.
3939
- VSR: Visual Spatial Reasoning dataset for true/false statement
40-
grounding about spatial relationships in images.
40+
vqa about spatial relationships in images.
4141
- dummy: Synthetic test dataset with simple black, white, and gray
4242
images for testing infrastructure.
4343
4444
Classes:
45-
GroundingDataset: Base class for all grounding datasets, providing
45+
VQADataset: Base class for all vqa datasets, providing
4646
common functionality for metadata creation, data format conversion,
4747
and zero-padding of state/action features.
4848
4949
Modules:
50-
base: Base class and common functionality for grounding datasets.
50+
base: Base class and common functionality for vqa datasets.
5151
clevr: CLEVR dataset implementation.
5252
cocoqa: COCO-QA dataset implementation.
5353
dummy: Dummy test dataset implementation.
5454
pixmo: PIXMO dataset implementation.
5555
vsr: VSR dataset implementation.
5656
5757
Example:
58-
Use a grounding dataset in training configuration:
58+
Use a vqa dataset in training configuration:
5959
>>> from opentau.configs.default import DatasetConfig
60-
>>> cfg = DatasetConfig(grounding="cocoqa")
60+
>>> cfg = DatasetConfig(vqa="cocoqa")
6161
>>> dataset = make_dataset(cfg, train_cfg)
6262
63-
Access available grounding datasets:
64-
>>> from opentau import available_grounding_datasets
65-
>>> print(list(available_grounding_datasets.keys()))
63+
Access available vqa datasets:
64+
>>> from opentau import available_vqa_datasets
65+
>>> print(list(available_vqa_datasets.keys()))
6666
['clevr', 'cocoqa', 'dummy', 'pixmo', 'vsr']
6767
"""

0 commit comments

Comments
 (0)