File tree Expand file tree Collapse file tree 2 files changed +0
-61
lines changed
src/opentau/datasets/grounding Expand file tree Collapse file tree 2 files changed +0
-61
lines changed Original file line number Diff line number Diff line change 1515
1616This module provides the PIXMO (Pixel-level Manipulation) dataset implementation
1717for training vision-language models on part localization and object grounding tasks.
18-
19- The dataset contains images with point annotations for object parts, enabling models
20- to learn fine-grained spatial understanding.
21-
22- The dataset is loaded from HuggingFace (allenai/pixmo-points) and includes
23- automatic retry logic for handling image download failures. Point coordinates
24- are normalized to a 255x255 grid and formatted as JSON strings in the postfix.
25-
26- Classes:
27- PixmoDataset: Dataset class that loads and formats PIXMO data for part
28- localization tasks.
29-
30- Functions:
31- _pil_from_url: Download and decode an image from URL with retry logic.
32- _get_post_fix: Convert point coordinates to normalized grid format and
33- format as JSON string.
34- _img_to_normalized_tensor: Convert PIL Image to normalized torch tensor.
35-
36- Constants:
37- IMG_SIZE: Target image size (224x224).
38- POINT_GRID: Grid size for point normalization (255x255).
39- MAX_RETRIES: Maximum HTTP retry attempts.
40- HTTP_TIMEOUT: HTTP request timeout in seconds.
41-
42- Example:
43- Use PIXMO dataset in training::
44-
45- >>> from opentau.configs.default import DatasetConfig
46- >>> cfg = DatasetConfig(grounding="pixmo")
47- >>> dataset = make_dataset(cfg, train_cfg)
4818"""
4919
5020import json
Original file line number Diff line number Diff line change 1717models on visual spatial reasoning tasks. The dataset contains images with
1818statements about spatial relationships, and models must determine whether each
1919statement is true or false based on the image content.
20-
21- The dataset is loaded from HuggingFace (cambridgeltl/vsr_random) and includes
22- automatic retry logic for handling image download failures. Statements are
23- formatted as grounding tasks with true/false labels.
24-
25- Key Features:
26- * Spatial reasoning: Tests understanding of spatial relationships between
27- objects in images.
28- * Binary classification: Simple true/false format for clear learning signal.
29- * Robust loading: Automatic retry with random sampling for failed image
30- downloads.
31-
32- Classes:
33- VSRDataset: Dataset class that loads and formats VSR data for true/false
34- spatial reasoning tasks.
35-
36- Functions:
37- _pil_from_url: Download and decode an image from URL with retry logic.
38- _img_to_normalized_tensor: Convert PIL Image to normalized torch tensor
39- with channel-first format and [0, 1] normalization.
40-
41- Constants:
42- MAX_RETRIES: Maximum HTTP retry attempts.
43- HTTP_TIMEOUT: HTTP request timeout in seconds.
44-
45- Example:
46- Use VSR dataset in training::
47-
48- >>> from opentau.configs.default import DatasetConfig
49- >>> cfg = DatasetConfig(grounding="vsr")
50- >>> dataset = make_dataset(cfg, train_cfg)
5120"""
5221
5322import logging
You can’t perform that action at this time.
0 commit comments