Batching a single graph #3812

hash-ir · 2022-01-07T07:45:28Z

hash-ir
Jan 7, 2022

Hi, I have a graph with ~100,000 nodes and ~1m edges. I created a torch_geometric.data.Dataset with the following overriden len() and get() methods:

class MyDataset(torch_geometric.data.Dataset):
    def __init__(self, root, name, transform=None, pre_transform=None)
        super().__init__(root, transform, pre_transform)
        self.name = name
        self.data = torch.load(self.processed_paths[0])
        self.transform = transform
        self.pre_transform = pre_transform

    @property
    def raw_file_names(self):
        return ['edge_index.npy', 'edge_attr.npy', 'node_attr.npy', 'node_labels.npy']

    @property
    def processed_file_names(self):
        return ['data.pt']

    def download(self):
        pass

    def process(self):
        # do processing
        data = Data(...)
        torch.save(data, self.processed_paths[0])

    def len(self):
        return len(self.processed_file_names)

    def get(self, idx):
        return torch.load(osp.join(self.processed_dir, 'data.pt'))

    def __repr__(self):
        return f"{self.name}_graph()"

Now, I need to create a data loader and divide this single graph into mini-batches. I know this can be done by dividing the adjacency matrix into block matrices where each block would correspond to a batch. When I use torch_geometric.loader.DataLoader, it gives me the entire graph in one batch. How do I change the MyDataset class in order to do batching for 1 graph?

Thanks,
Hashir

Answered by wsad1

Jan 8, 2022

torch_geometric.loader.DataLoader is used to load a batch of graphs. For minibatch training of a single large graph use NeighborLoader.
So your code would look something like
loader = NeighborLoader(MyDataset.data, num_neighbors =, batch_size = )

View full answer

wsad1 · 2022-01-08T06:29:16Z

wsad1
Jan 8, 2022
Maintainer

torch_geometric.loader.DataLoader is used to load a batch of graphs. For minibatch training of a single large graph use NeighborLoader.
So your code would look something like
loader = NeighborLoader(MyDataset.data, num_neighbors =, batch_size = )

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Batching a single graph #3812

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Batching a single graph #3812

Uh oh!

hash-ir Jan 7, 2022

Replies: 1 comment

Uh oh!

wsad1 Jan 8, 2022 Maintainer

hash-ir
Jan 7, 2022

wsad1
Jan 8, 2022
Maintainer