-
Hi, I am trying to make a custom Dataset class. I followed the documentation for the in memory dataset but now I need to replace the last row of init of the class to adapt it for multiple loads because I have several files to load in the same dataset. There isn't a file per graph, instead I can have 1000 graphs in data_0.pt and 300 in data_1.pt. I tried to combine buffers with
but it gives the error Then I came up using Batch class
but it gives an error on row 122 of batch.py (after starting from_data_list method) Edit: I am trying also without in memory dataset approach and it seems that the files are assumed to be one graph - one file. I am completely lost in this. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
In general, all you have to care about is how to create a list of data objects in I'm not exactly sure what you are trying to do, but I suggest to put all graphs of all files in a single data list, and save/load it via torch.save(self.collate([data_list]), self.processed_paths[0])
self.data, self.slices = torch.load(self.processed_paths[0]) In case you really want to save them into distinct files, you could do the following. In for filename in ...:
data_list = [] # Create a list of data objects for the file `filename`
torch.save(self.collate(data_list), osp.join(self.processed_dir, filename)) and load one of them in self.data, self.slices = torch.load(osp.join(self.processed_dir, filename)) |
Beta Was this translation helpful? Give feedback.
-
Self answering, I almost forgot, maybe there are others with the same problem. I came up looking for too long to the examples of in memory and not in memory classes. The trick to load multiple file is the place of the collate function call. In the example there is import torch
from torch_geometric.data import InMemoryDataset, download_url
class MyOwnDataset(InMemoryDataset):
def __init__(self, root, transform=None, pre_transform=None):
super(MyOwnDataset, self).__init__(root, transform, pre_transform)
self.data, self.slices = torch.load(self.processed_paths[0])
def process(self):
# Read data into huge `Data` list.
data_list = [...]
data, slices = self.collate(data_list)
torch.save((data, slices), self.processed_paths[0]) and import os.path as osp
import torch
from torch_geometric.data import Dataset, download_url
class MyOwnDataset(Dataset):
def __init__(self, root, transform=None, pre_transform=None):
super(MyOwnDataset, self).__init__(root, transform, pre_transform)
def process(self):
i = 0
for raw_path in self.raw_paths:
# Read data from `raw_path`.
data = Data(...)
torch.save(data, osp.join(self.processed_dir, 'data_{}.pt'.format(i)))
i += 1
def len(self):
return len(self.processed_file_names)
def get(self, idx):
data = torch.load(osp.join(self.processed_dir, 'data_{}.pt'.format(idx)))
return data In the second there is something missing when saving in process and surprise surprise it's the collate as I wrote in the beginning. So how to load multiple files in memory? Like this: import torch
from torch_geometric.data import InMemoryDataset, download_url
class MyOwnDataset(InMemoryDataset):
def __init__(self, root, transform=None, pre_transform=None):
super(MyOwnDataset, self).__init__(root, transform, pre_transform)
datalist = torch.load(self.processed_paths[0])
self.data, self.slices = self.collate(data_list)
def process(self):
# Read data into huge `Data` list.
data_list = [...]
torch.save(data_list, self.processed_paths[0]) This example is the same but if we save more files in class MyOwnDataset(InMemoryDataset):
def __init__(self, root, transform=None, pre_transform=None):
super(MyOwnDataset, self).__init__(root, transform, pre_transform)
data_list = []
for path in self.processed_paths:
data_list += torch.load(path)
self.data, self.slices = self.collate(data_list)
# etc |
Beta Was this translation helpful? Give feedback.
Self answering, I almost forgot, maybe there are others with the same problem. I came up looking for too long to the examples of in memory and not in memory classes. The trick to load multiple file is the place of the collate function call. In the example there is