All samples in a given batch are not returned when using DataListLoader and DataParallel #6214

tommysisk · 2022-12-12T09:19:23Z

tommysisk
Dec 12, 2022

I am attempting to use DataParallel and DataListLoader to train the graph neural network shown below (ignoring all but the forward method) on 8 GPUs. The network takes a Heterograph as input and returns a softmax output. When I run a single sample through the network without applying DataParallel, I get the expected softmax output. However, when I apply DataParallel to the network module and pass a batch/list of graphs from DataListLoader through it in a training loop, I am only returned 8 tensors. Upon further inspection, I found that the number of outputs I get == the number of GPUs I specify in the DataParallel call. Below the forward method of the network I show the output I get outside of the training loop without applying DataParallel and after that I show the result of passing data through the model in a training loop where DataParallel is applied and the batch is loaded from DataListLoader. Simply put, I want to return the softmax output for all samples in the batch with the correct grad_fn (softmax) so I can input them into a user defined loss function i.e. a loss function not offered by PYG or regular torch. I am new to graph neural networks and am likely at fault here but I can't see any obvious reason why this problem would arise based on docs and example. Please let me know what I'm doing wrong :)

Answered by rusty1s

Dec 13, 2022

I am not totally sure if DataListLoader works with HeteroData objects TBH. The recommended approach is to use DistributedDataParallel for this. Here is an example: https://github.com/pyg-team/pytorch_geometric/blob/master/examples/multi_gpu/distributed_batching.py which should work seamlessly with HeteroData objects as well.

View full answer

rusty1s · 2022-12-13T09:16:57Z

rusty1s
Dec 13, 2022
Maintainer

I am not totally sure if DataListLoader works with HeteroData objects TBH. The recommended approach is to use DistributedDataParallel for this. Here is an example: https://github.com/pyg-team/pytorch_geometric/blob/master/examples/multi_gpu/distributed_batching.py which should work seamlessly with HeteroData objects as well.

1 reply

tommysisk Dec 14, 2022
Author

Indeed, seamless! Thank you!!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

All samples in a given batch are not returned when using DataListLoader and DataParallel #6214

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

All samples in a given batch are not returned when using DataListLoader and DataParallel #6214

Uh oh!

tommysisk Dec 12, 2022

Replies: 1 comment · 1 reply

Uh oh!

rusty1s Dec 13, 2022 Maintainer

Uh oh!

tommysisk Dec 14, 2022 Author

tommysisk
Dec 12, 2022

Replies: 1 comment 1 reply

rusty1s
Dec 13, 2022
Maintainer

tommysisk Dec 14, 2022
Author