-
Hi, I have a graph with ~100,000 nodes and ~1m edges. I created a class MyDataset(torch_geometric.data.Dataset):
def __init__(self, root, name, transform=None, pre_transform=None)
super().__init__(root, transform, pre_transform)
self.name = name
self.data = torch.load(self.processed_paths[0])
self.transform = transform
self.pre_transform = pre_transform
@property
def raw_file_names(self):
return ['edge_index.npy', 'edge_attr.npy', 'node_attr.npy', 'node_labels.npy']
@property
def processed_file_names(self):
return ['data.pt']
def download(self):
pass
def process(self):
# do processing
data = Data(...)
torch.save(data, self.processed_paths[0])
def len(self):
return len(self.processed_file_names)
def get(self, idx):
return torch.load(osp.join(self.processed_dir, 'data.pt'))
def __repr__(self):
return f"{self.name}_graph()" Now, I need to create a data loader and divide this single graph into mini-batches. I know this can be done by dividing the adjacency matrix into block matrices where each block would correspond to a batch. When I use Thanks, |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
|
Beta Was this translation helpful? Give feedback.
torch_geometric.loader.DataLoader
is used to load a batch of graphs. For minibatch training of a single large graph use NeighborLoader.So your code would look something like
loader = NeighborLoader(MyDataset.data, num_neighbors =, batch_size = )