DataLoader Batch Size #9109

101AlexMartin · 2024-03-26T16:23:46Z

101AlexMartin
Mar 26, 2024

Hi,

I'm confused with the way the DataLoader works. I'm trying to solve a problem with a train set consisting on graphs of different sizes (i.e. different number of nodes and attributes). Likewise, the test set has also graphs with different sizes. When I use the dataloader in order to get the batches to train the network, I see that everything is put into a single graph (i.e. the effective batch size that goes through the network model is 1). The performance I get from the trained network is very poor, so I'm wondering if this way of working of the dataloader might be impacting the performance of the method, since the network is not able to be robust to extrapolation to graphs of different sizes.

Answered by simon-forb

Mar 27, 2024

The way I understand how batching in PyTorch Geometric works is that each batch consists of a big graph that is formed by putting the individual graphs together. This is what you observed when you are writing that the effective batch size is 1.

In particular, the adjacency matrices of the individual matrices are put together to form a big block-diagonal matrix. This means that their original structure remains unaltered.

The reason why implementing batching in this fashion is smart is that for message-passing graph neural networks, the individual graphs which combined form the batch, do not affect one another since they are not connected by any edges.

All of this is explained nicely in the t…

View full answer

simon-forb · 2024-03-27T10:24:21Z

simon-forb
Mar 27, 2024

The way I understand how batching in PyTorch Geometric works is that each batch consists of a big graph that is formed by putting the individual graphs together. This is what you observed when you are writing that the effective batch size is 1.

In particular, the adjacency matrices of the individual matrices are put together to form a big block-diagonal matrix. This means that their original structure remains unaltered.

The reason why implementing batching in this fashion is smart is that for message-passing graph neural networks, the individual graphs which combined form the batch, do not affect one another since they are not connected by any edges.

All of this is explained nicely in the tutorials.

To answer your question: I presume it is unlikely that the batching is the reason for your network's poor performance. To verify, you could repeat your experiment (i) either without batching by simply putting the graphs into your network straight from the dataset or (ii) by setting your batch size to 1.

2 replies

101AlexMartin Mar 29, 2024
Author

Hi, thanks for your answer. The first solution I think is a bit tricky, because a GNN does need the number of nodes to be the batch size, so stacking the original dataset in a third dimension could be tricky. For the second option you propose, I'm not really in favor of using a batch size of 1. Do you know how can I unbatch the output that the DataLoader provides (i.e. torch_geometric.data.batch.DataBatch)? It could be manually done by using the information contained withing the DataBatch object, but I assume there must be an unbatch() method within PyG (which I'm unable to find)

rusty1s Apr 2, 2024
Maintainer

You can use to_data_list() to do that. I think @simon-forb gave a spot-on answer. If you see poor performance in mini-batching, you would need to do at least some ablation that the mini-batching indeed produces this. If that's the case, then it is likely that there is an issue in your model architecture that breaks in case of batch sizes greater than 1.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DataLoader Batch Size #9109

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

DataLoader Batch Size #9109

Uh oh!

101AlexMartin Mar 26, 2024

Replies: 1 comment · 2 replies

Uh oh!

Uh oh!

simon-forb Mar 27, 2024

Uh oh!

101AlexMartin Mar 29, 2024 Author

Uh oh!

rusty1s Apr 2, 2024 Maintainer

101AlexMartin
Mar 26, 2024

Replies: 1 comment 2 replies

simon-forb
Mar 27, 2024

101AlexMartin Mar 29, 2024
Author

rusty1s Apr 2, 2024
Maintainer