Add Support for Loading CSV datasets #1042

shenghann · 2023-04-26T01:30:18Z

shenghann
Apr 26, 2023

What is the motivation for this task?

Currently, training anomalib on custom dataset requires images to be physically separated into folders. To keep track of data revisions, over time we are forced to keep duplicated copies of images. This leads to high usage of local disk space.
The use case of shuffling and experimenting with a subset of training images will also benefit from this solution.

Describe solution

Supporting data loading through CSVs solves this problem and allow a master copy of images to be stored only once - and use CSVs containing list of paths pointing to these images as an interface to dataset ingestion in anomalib.

CSVs to follow a pre-defined schema, to contain at least:

image_path - Relative path to image (from root in config)

Additional context

(proposal from team 2 for the OSS Hackathon)

djdameln · 2023-04-26T11:45:37Z

djdameln
Apr 26, 2023

Hi, I understand the motivation for this task, but I'm not sure about the added advantage of a csv dataloader for the average user. Maybe you could expand the scope a bit and add support for an existing dataset annotation format such as coco. In this case the dataset structure would not be defined by a .csv file, but by a .json file. This would have the added advantage of being able to pass pixel-level annotations in the form of polygons. Other dataset formats would be possible as well of course.

Some things to consider:

How will you approach the subset splitting? Do you ask the user to always provide separate annotation files for training, validation and testing? Or, do we ask the users to provide separate annotation files for normal vs anomalous images and we then perform the subset splitting dynamically?
Implementation wise, I think a good approach would be to create new subclasses of AnomalibDataset and AnomalibDataModule. You could have a look at the FolderDataset and FolderDataModule to get a basic idea how these classes would be structured. Most of the implementation effort would go into writing make_dataset function.

1 reply

shenghann Apr 26, 2023
Author

Thanks for the response! I forgot to mention that I have already extended FolderDataset and FolderDataModule in my own fork to achieve loading dataset from CSV files and that we're using it actively in our internal Intel projects for manufacturing. Glad I'm on the right approach with the implementation. (would you like me to submit a PR directly so we can go into review of the actual implementation?)

For subset splitting, I'm following the convention set by folder datasets to allow users to define the following in config.yaml - all of which can be either absolute or relative paths:

normal_csv
abnormal_csv
normal_test_csv

In essence, the csv data module is conceptually same as folder data module except that the list of files are obtained from a file instead of walking a directory.

I can look into extending the scope to support annotations, but would love to have csvs supported natively.

shenghann · 2023-04-26T15:08:42Z

shenghann
Apr 26, 2023
Author

Created a PR here: #1050

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Support for Loading CSV datasets #1042

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Add Support for Loading CSV datasets #1042

Uh oh!

Uh oh!

shenghann Apr 26, 2023

What is the motivation for this task?

Describe solution

Additional context

Replies: 2 comments · 1 reply

Uh oh!

Uh oh!

djdameln Apr 26, 2023

Uh oh!

Uh oh!

shenghann Apr 26, 2023 Author

Uh oh!

shenghann Apr 26, 2023 Author

shenghann
Apr 26, 2023

Replies: 2 comments 1 reply

djdameln
Apr 26, 2023

shenghann Apr 26, 2023
Author

shenghann
Apr 26, 2023
Author