dataset dicts can't fit in my RAM memory #3450

juanluisrosaramos · 2021-09-03T18:15:12Z

juanluisrosaramos
Sep 3, 2021

I have to train an instance segmentation model with a custom dataset of more than 7 million samples. I've built the training pipeline with 20k images and everything worked smoothly. But now I can load in RAM memory the whole dicts dataset. Segmentation information in polygons takes lots of memory. I'm training to avoid the dataset_dicts loading in memory using torch.datasets and torch.dataloaders but it is a different schema and detectron2 models can't train.
Is there any possibility to use detectron2 with big datasets?
Thank you

Answered by juanluisrosaramos

Sep 5, 2021

I'm solving it using a pytorch map-style dataset. It does not provide/preload a list[dict] with the lightweight representation of all the training instances.
My dataset has one image and a Json file with all its annotations

from torch.utils.data import Dataset
class yourDatasetClass(Dataset):
def init(self, df, transforms=None): #df is a dataframe with all the annotations of your dataset
.....
def getitem(self, idx):
target = {}
target["file_name"] = imgPath
............
return target

In pytorch and specially torchvision there are tutorials on building a detection dataset but you need to return image and targets. In detectron2 you only have to return a dict with the details specified in s…

View full answer

juanluisrosaramos · 2021-09-05T09:08:51Z

juanluisrosaramos
Sep 5, 2021
Author

I'm solving it using a pytorch map-style dataset. It does not provide/preload a list[dict] with the lightweight representation of all the training instances.
My dataset has one image and a Json file with all its annotations

from torch.utils.data import Dataset
class yourDatasetClass(Dataset):
def init(self, df, transforms=None): #df is a dataframe with all the annotations of your dataset
.....
def getitem(self, idx):
target = {}
target["file_name"] = imgPath
............
return target

In pytorch and specially torchvision there are tutorials on building a detection dataset but you need to return image and targets. In detectron2 you only have to return a dict with the details specified in standard datasets definition.

If you return a dict with all the necessary data the default DataMapper from detectron2 will take care of building the exact input needed but the model.

Thanks, Juan Luis

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

dataset dicts can't fit in my RAM memory #3450

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

dataset dicts can't fit in my RAM memory #3450

Uh oh!

juanluisrosaramos Sep 3, 2021

Replies: 1 comment

Uh oh!

juanluisrosaramos Sep 5, 2021 Author

juanluisrosaramos
Sep 3, 2021

juanluisrosaramos
Sep 5, 2021
Author