DataLoader Batch Size #9109
-
Hi, I'm confused with the way the DataLoader works. I'm trying to solve a problem with a train set consisting on graphs of different sizes (i.e. different number of nodes and attributes). Likewise, the test set has also graphs with different sizes. When I use the dataloader in order to get the batches to train the network, I see that everything is put into a single graph (i.e. the effective batch size that goes through the network model is 1). The performance I get from the trained network is very poor, so I'm wondering if this way of working of the dataloader might be impacting the performance of the method, since the network is not able to be robust to extrapolation to graphs of different sizes. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
The way I understand how batching in PyTorch Geometric works is that each batch consists of a big graph that is formed by putting the individual graphs together. This is what you observed when you are writing that the effective batch size is 1. In particular, the adjacency matrices of the individual matrices are put together to form a big block-diagonal matrix. This means that their original structure remains unaltered. The reason why implementing batching in this fashion is smart is that for message-passing graph neural networks, the individual graphs which combined form the batch, do not affect one another since they are not connected by any edges. All of this is explained nicely in the tutorials. To answer your question: I presume it is unlikely that the batching is the reason for your network's poor performance. To verify, you could repeat your experiment (i) either without batching by simply putting the graphs into your network straight from the dataset or (ii) by setting your batch size to 1. |
Beta Was this translation helpful? Give feedback.
The way I understand how batching in PyTorch Geometric works is that each batch consists of a big graph that is formed by putting the individual graphs together. This is what you observed when you are writing that the effective batch size is 1.
In particular, the adjacency matrices of the individual matrices are put together to form a big block-diagonal matrix. This means that their original structure remains unaltered.
The reason why implementing batching in this fashion is smart is that for message-passing graph neural networks, the individual graphs which combined form the batch, do not affect one another since they are not connected by any edges.
All of this is explained nicely in the t…