Datasets

All Datasets are available on the Zenodo Repository.

Datasets Overview

Real Datasets

These datasets contain real images of 2D material flakes, annotated with their respective classes and layer counts.
The test and train images are form multiple different exfoliation runs, which are used to ensure that no data leakage occurs between the training and testing sets.

Dataset	Training Images	Testing Images	Annotated Flakes	Zenodo Link
GrapheneL	425	1362	1 to 4 layers	Download
GrapheneM	325	357	1 to 4 layers	Download
GrapheneH	438	480	1 to 4 layers	Download
WSe2	97	99	1 to 3 layers	Download
WSe2L	92	420	1 to 3 layers	Download
MoSe2	97	63	1 to 2 layers	Download
WS2	53	94	1 layer	Download
hBN_Thin	73	62	1 to 3 layers	Download

Synthetic Datasets

These images are synthetically generated using the MaskTerial Synthetic Data Generator.
Check out that repository for more information on how to generate your own synthetic data.

Dataset	Training Images	Testing Images	Annotated Flakes	Zenodo Link
Graphene	42274	100	1 to 10 layers	Download
CrI3	40338	100	1 to 10 layers	Download
hBN	43511	100	1 to 10 layers	Download
MoSe2	41645	100	1 to 10 layers	Download
TaS2	42850	100	1 to 10 layers	Download
WS2	42451	100	1 to 10 layers	Download
WSe2	41146	100	1 to 10 layer	Download

Structure of the MaskTerial Dataset

A MaskTerial Dataset follows the structure below:

GrapheneH
├───meta_data
│ ├───test_set_name_to_uuid.json
│ └───train_set_name_to_uuid.json
├───RLE_annotations
│ ├───1_shot
│ │ ├───run_0
│ │ │ ├───train_annotations_300.json
│ │ │ └───train_annotations_with_class_300.json
│ │ ├───run_1
│ │ │ ├───train_annotations_300.json
│ │ │ └───train_annotations_with_class_300.json
│ │ ├───...
│ │ └───run_9
│ │ ├───train_annotations_300.json
│ │ └───train_annotations_with_class_300.json
│ ├───2_shot
│ │ └───[similar run_0 to run_9 structure]
│ ├───3_shot
│ │ └───[similar run_0 to run_9 structure]
│ ├───5_shot
│ │ └───[similar run_0 to run_9 structure]
│ ├───10_shot
│ │ └───[similar run_0 to run_9 structure]
│ ├───test_annotations_300.json
│ ├───test_annotations_with_class_200.json
│ ├───test_annotations_with_class_300.json
│ ├───test_annotations_with_class_full.json
│ ├───train_annotations_300.json
│ ├───train_annotations_with_class_200.json
│ ├───train_annotations_with_class_300.json
│ └───train_annotations_with_class_full.json
├───test_images
| └───[images for testing]
├───test_semantic_masks
│ └───[semantic masks for testing]
├───train_images
│ └───[images for training]
└───train_semantic_masks
│ └───[semantic masks for training]

The RLE_annotations folder contains annotation files in the COCO format, the suffix _200 indicates the minimum number of pixels used in that file, the annotation files with the _full suffix contain all possible flake instances but is not used during evaluation, the evaluation uses the _300 file.

The difference between the train_annotations_with_class_300.json and train_annotations_300.json files is that the former contains the class information for each instance, while the latter only contains information about if the given instance is a flake or not. The test_annotations_with_class_300.json and test_annotations_300.json files serve the same purpose for the test set. This is used to ablate the model's performance when given the class information or not.

There are also the 1_shot, 2_shot, 3_shot, 5_shot, and 10_shot folders, which contain the annotations for the respective number of shots. Each of these folders contains runs from run_0 to run_9, which are used for training with different random seeds and to check the stability of the model. The train_annotations_with_class_300.json file contains the annotations for the training set with classes, while the train_annotations_300.json file contains the annotations without classes.

Furthermore the instances described in the COCO annotation file are transcribed as a semantic mask in the semantic mask folder. The pixel values in the semantic masks correspond to the class of the instance, with 0 representing the background and 1, 2, 3, and so on representing the different classes of flakes. This is why the semantic masks may look black, as the pixel values are very low.

Please note, the provided semantic masks only include instances with an area larger than 200px. Given the images are captured with a 20x Objective, this equates to approximately 30μm² in size.

Annotation Process

All images where annotated using the labelme tool to generate polygon annotations, which were then converted to RLE annotations. It is also possible to use labelstudio to annotate the images, as it supports the COCO format and can be used to generate RLE annotations. You just need to make sure that your final annotations are in the COCO format and are using RLE-encoded masks, not polygon annotations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Datasets

Datasets Overview

Real Datasets

Synthetic Datasets

Structure of the MaskTerial Dataset

Annotation Process

FilesExpand file tree

DATASETS.md

Latest commit

History

DATASETS.md

File metadata and controls

Datasets

Datasets Overview

Real Datasets

Synthetic Datasets

Structure of the MaskTerial Dataset

Annotation Process