|
11 | 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
12 | 12 | # See the License for the specific language governing permissions and |
13 | 13 | # limitations under the License. |
| 14 | +"""Dataset management and processing utilities for robot learning and vision-language tasks. |
| 15 | +
|
| 16 | +This module provides a comprehensive toolkit for loading, creating, managing, and |
| 17 | +processing datasets for training vision-language-action (VLA) models. It supports |
| 18 | +both robot learning datasets (with actions and states) and vision-language |
| 19 | +grounding datasets (for multimodal understanding tasks). |
| 20 | +
|
| 21 | +The module is organized into several key components: |
| 22 | +
|
| 23 | + - **Core Datasets**: LeRobotDataset for robot learning data with support for |
| 24 | + temporal alignment, multi-modal data, and version compatibility. |
| 25 | + - **Grounding Datasets**: Vision-language datasets (CLEVR, COCO-QA, PIXMO, VSR) |
| 26 | + for training visual understanding without robot actions. |
| 27 | + - **Dataset Mixtures**: WeightedDatasetMixture for combining multiple datasets |
| 28 | + with controlled sampling proportions. |
| 29 | + - **Data Processing**: Utilities for statistics computation, image/video |
| 30 | + handling, transforms, and format standardization. |
| 31 | + - **Factory Functions**: High-level functions for creating datasets and mixtures |
| 32 | + from configuration objects. |
| 33 | +
|
| 34 | +Key Features: |
| 35 | +
|
| 36 | + - **HuggingFace Integration**: Seamless loading from HuggingFace Hub with |
| 37 | + automatic version checking and backward compatibility. |
| 38 | + - **Temporal Alignment**: Delta timestamps enable sampling features at |
| 39 | + different time offsets with optional Gaussian noise for data augmentation. |
| 40 | + - **Multi-modal Support**: Handles images, videos, state vectors, actions, |
| 41 | + and text prompts with automatic format conversion. |
| 42 | + - **Weighted Sampling**: Combine heterogeneous datasets with configurable |
| 43 | + sampling weights for balanced training. |
| 44 | + - **Standard Data Format**: Unified data format across all datasets for |
| 45 | + consistent model input/output interfaces. |
| 46 | + - **Statistics Management**: Automatic computation and aggregation of dataset |
| 47 | + statistics for normalization. |
| 48 | + - **Video Handling**: Multiple video backends (torchcodec, pyav, video_reader) |
| 49 | + for efficient frame extraction and encoding. |
| 50 | + - **Asynchronous I/O**: High-performance image writing for real-time data |
| 51 | + recording without blocking. |
| 52 | +
|
| 53 | +Main Modules: |
| 54 | +
|
| 55 | + - **lerobot_dataset**: Core dataset implementation for robot learning data. |
| 56 | + - **grounding**: Vision-language grounding datasets (CLEVR, COCO-QA, PIXMO, VSR). |
| 57 | + - **dataset_mixture**: Weighted combination of multiple datasets. |
| 58 | + - **factory**: Factory functions for creating datasets from configurations. |
| 59 | + - **utils**: Utility functions for I/O, metadata management, and validation. |
| 60 | + - **compute_stats**: Statistics computation and aggregation utilities. |
| 61 | + - **transforms**: Image transformation pipelines for data augmentation. |
| 62 | + - **video_utils**: Video encoding, decoding, and metadata extraction. |
| 63 | + - **image_writer**: Asynchronous image writing for high-frequency recording. |
| 64 | + - **sampler**: Episode-aware sampling with boundary frame filtering. |
| 65 | + - **standard_data_format_mapping**: Feature name and loss type mappings. |
| 66 | +
|
| 67 | +Example: |
| 68 | + Create a dataset mixture from configuration: |
| 69 | +
|
| 70 | + >>> from opentau.datasets.factory import make_dataset_mixture |
| 71 | + >>> mixture = make_dataset_mixture(train_cfg) |
| 72 | + >>> dataloader = mixture.get_dataloader() |
| 73 | +
|
| 74 | + Load a single dataset: |
| 75 | +
|
| 76 | + >>> from opentau.datasets.factory import make_dataset |
| 77 | + >>> dataset = make_dataset(dataset_cfg, train_cfg) |
| 78 | +
|
| 79 | + Access grounding datasets: |
| 80 | +
|
| 81 | + >>> from opentau import available_grounding_datasets |
| 82 | + >>> print(list(available_grounding_datasets.keys())) |
| 83 | + ['clevr', 'cocoqa', 'dummy', 'pixmo', 'vsr'] |
| 84 | +""" |
0 commit comments