HOTS: Household Objects in Tabletop Scenarios

HOTS in an ongoing effort to collect an RGB-D dataset aimed for benchmarking visual and 3D recognition, sim-to-real domain adaptation (and more) in robotics-related domains. It contains a broad range of typical household objects annotated in multiple levels of granularity for class labels (supercategory, category and instance), as well as visual attribute annotations (color, material etc.) and pair-wise spatial relations in the scene-level (scene graphs). The current version of the dataset contains only single view RGB-D data captured from an ASUS Xtion sensor mounted on a real-robot setup.

The dataset enumerates a total of 46 object instances organized as

Annotation	Number	Classes
Supercategory	5	edible products(13), electronics(5), fruits(6), kitchenware(11), stationery(11)
Category	25	apple(1), banana(1), book(3), bowl(1), soda can(5), cup(3), fork(2), juice box(3), keyboard(1), knife(1), laptop(1), lemon(1), marker(2), milk box(2), monitor(1), mouse(2), orange(1), peach(1), pear(1), pen(3), plate(2), pringles box(3), scissors(2), spoon(2), stapler(1)
Color	11	red(6), yellow(4), blue(5), white(6), purple(2), green(4), black(9), transparent(1), silver(6), orange(2), pink(1)
Material	7	organic(6), paper(7), ceramic(5), aluminium(5), glass(1), metal(8), plastic(14)

Data Structure

Download and unzip the dataset from here

Inside the root directory you will find two sub-folders, namely: a) object, that containts object-level RGB-D images aimed for object recognition task, cropped from the original scene frames according to their bounding box annotation, and b) scene, that contains scene-level RGB-D images, organized by title in different splits according to the type of objects appearing (table, kitchen, office, mix). Annotations contain bounding boxes for object detection and pixel-level masks for semantic / instance segmentation tasks.

The object-level directory structure follows the classic ImageFolder class-per-folder style:

#  - ./HOTS/object
#      - apple
#        - 0.png
#        - 1.png
#        - ...
#      ...
#      - stapler
#        - 0.png
#        - 1.png
#        - ...

The scene-level directory follows the VOC-style structure for each task

Image data

#  - ./HOTS/scene
#    - RGB
#      - kitchen_5_top_raw_0.png
#      - ...
#      - table_8_top_raw_9.png

Object Detection

#  - ./HOTS/scene
#    - ObjectDetection
#      - Annotations
#        - kitchen_5_top_raw_0.xml
#        - ...
#      - AnnotationsVisualization
#        - kitchen_5_top_raw_0.jpg
#        - ...
#      - class_names.txt

Semantic / Instance Segmentation

#  - ./HOTS/scene
#    - SemanticSegmentation
#      - SegmentationClass
#        - kitchen_5_top_raw_0.npy
#        - ...
#      - SegmentationClassPNG
#        - kitchen_5_top_raw_0.png
#        - ...
#      - SegmentationClassVisualization
#        - kitchen_5_top_raw_0.jpg
#        - ...
#
#    - InstanceSegmentation
#      - SegmentationObject
#        - kitchen_5_top_raw_0.npy
#        - ...
#      - SegmentationObjectPNG
#        - kitchen_5_top_raw_0.png
#        - ...
#      - SegmentationObjectVisualization
#        - kitchen_5_top_raw_0.jpg
#        - ...
#    - class_names.txt

The -Class folder contains raw pixel-level masks using the unique index of each class (as ordered in class_names.txt) or instance. The -PNG folder ommits the background and show coloured version of the pixel-level annotations. The Visualization folders contain the RGB data with drawn annotations for each task

Object Detection	Semantic Segmentation	Instance Segmentation

APIs

Python API for loading the datasets. Use transform=True to tensorize the data. Alternatively, pass a torchvision.transforms object to transform the data according to your desired preprocessing.

Object-level:

from hots import load_HOTS_objects

train_dataset, test_dataset = load_HOTS_objects(transform=False)
image, label = train_dataset[0]
# image: np.array[uint8] (H x W x 3, raw pixel values)
# label: str

train_dataset, test_dataset = load_HOTS_objects(transform=True)
image, label = train_dataset[0]
# image: torch.Tensor[float32] (3 x H x W, normalized)
# label: torch.Tensor[int64] (unique index from labels.txt)

Scene-level:

from hots import load_HOTS_scenes

train_dataset, test_dataset = load_HOTS_scenes(transform=False)
image, target = train_dataset[0]
# image: np.array[uint8] (H x W x 3, raw pixel values)
# target: Dict[str, Any]:
#     'boxes'          :   np.int32 (4, raw bounding box coordinates)
#     'labels'         :   np.int32 (N, unique index of all N appearing objects according to labels.txt order)
#     'class_masks'    :   np.uint8 (N x H x W, pixel-level mask containing class labels as pixel values)
#     'instance_masks' :   np.uint8 (N x H x W, pixel-level mask containing an integer [0, N-1] for each object instance as pixel values)
#     'image_id'       :   int32 (A unique indentifier of the input image)
#     'area'           :   int32 (The total area of the bounding box, used for COCO-like object detection evaluation)
#     'image_size'     :   Tuple[int3, int32] (The resolution of the input image)

Similarly, use transform=True or a pre-defined torchvision transform to preprocess the raw image data. Using transform=True will also tensorize the values of the target dictionary and will additionaly normalize bounding box coordinates according to the image_size field.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
assets		assets
README.md		README.md
catalogue.tsv		catalogue.tsv
hots.py		hots.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HOTS: Household Objects in Tabletop Scenarios

Data Structure

APIs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

HOTS: Household Objects in Tabletop Scenarios

Data Structure

APIs

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages