Predictions for last (irregular) batch that is smaller than batch_size in examples/ogbn_products_gat.py #4233
-
Hello! I have one question regarding the code in the example script ogbn_products_gat.py At first batches are sampled with Neighbor Sampler. train_loader = NeighborSampler(data.edge_index, node_idx=train_idx,
sizes=[10, 10, 10], batch_size=512,
shuffle=True, num_workers=12) Here, the drop_last argument that comes from DataLoader is not specified and by default set to False. So that means that the last batch is smaller than the others. Lateron, when iterating through the batches, the predictions and ground truth values for the target nodes in a batch are referenced with batch_size: for batch_size, n_id, adjs in train_loader:
[...]
loss = F.nll_loss(out, y[n_id[:batch_size]])
[...] Doesn't this lead to false loss values in the last batch, since there are less than batch_size target nodes? Or did I get something wrong here? Thanks a lot for your help in advance! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Yes, that is correct. In theory, it's better to use |
Beta Was this translation helpful? Give feedback.
Yes, that is correct. In theory, it's better to use
drop_last=True
(since the examples in the last batch will have more impact on the loss). However, I doubt that this leads to any differences in practice. At least I've never made bad experience when not dropping the last batch. Furthermore, I don't this this is very common to see in any PyTorch projects.