dataset dicts can't fit in my RAM memory #3450
-
I have to train an instance segmentation model with a custom dataset of more than 7 million samples. I've built the training pipeline with 20k images and everything worked smoothly. But now I can load in RAM memory the whole dicts dataset. Segmentation information in polygons takes lots of memory. I'm training to avoid the dataset_dicts loading in memory using torch.datasets and torch.dataloaders but it is a different schema and detectron2 models can't train. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
I'm solving it using a pytorch map-style dataset. It does not provide/preload a list[dict] with the lightweight representation of all the training instances. from torch.utils.data import Dataset In pytorch and specially torchvision there are tutorials on building a detection dataset but you need to return image and targets. In detectron2 you only have to return a dict with the details specified in standard datasets definition. If you return a dict with all the necessary data the default DataMapper from detectron2 will take care of building the exact input needed but the model. Thanks, Juan Luis |
Beta Was this translation helpful? Give feedback.
I'm solving it using a pytorch map-style dataset. It does not provide/preload a list[dict] with the lightweight representation of all the training instances.
My dataset has one image and a Json file with all its annotations
from torch.utils.data import Dataset
class yourDatasetClass(Dataset):
def init(self, df, transforms=None): #df is a dataframe with all the annotations of your dataset
.....
def getitem(self, idx):
target = {}
target["file_name"] = imgPath
............
return target
In pytorch and specially torchvision there are tutorials on building a detection dataset but you need to return image and targets. In detectron2 you only have to return a dict with the details specified in s…