Distributed Training in PyG: How to partition the data that does not fit in memory? #9200

s-udhaya · 2024-04-15T13:04:32Z

s-udhaya
Apr 15, 2024

Hi,
Thank you for this excellent library! I'm working with a large graph dataset that exceeds available memory. While the distributed training documentation (https://pytorch-geometric.readthedocs.io/en/latest/tutorial/distributed_pyg.html) is helpful, I have a question about dataset partitioning:

Since the documentation indicates partitioning occurs on in-memory data, how can I effectively leverage the distributed training system when my entire dataset doesn't fit into memory? Are there recommended strategies or examples for this scenario?

Amyr14 · 2024-04-15T17:06:41Z

Amyr14
Apr 15, 2024

Hey. I'm not a developer of pyg, but I submitted a very similar question a while ago. You should take a look on sampling techniques. Pyg has some implemented sampling interfaces in the sampler module. If none of them are exactly what you need, you can always implement it yourself using the BaseSampler abstract class. Hope this helps.

2 replies

s-udhaya Apr 15, 2024
Author

Hey, thanks for sharing the links. I will look into that.

Amyr14 Apr 15, 2024

Forgot to tell you that you can also use a data loader, since they implement a sampler and wrap the sampler output in a iterator that you can use to perform mini-batch training. If you still need to customize the sampling routine, you can use the NodeLoader and/or LinkLoader class and implement a BaseSampler for it.
Also, in the link you provided there is a section explaining how to use distributed data loaders.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Distributed Training in PyG: How to partition the data that does not fit in memory? #9200

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Distributed Training in PyG: How to partition the data that does not fit in memory? #9200

Uh oh!

s-udhaya Apr 15, 2024

Replies: 1 comment · 2 replies

Uh oh!

Amyr14 Apr 15, 2024

Uh oh!

s-udhaya Apr 15, 2024 Author

Uh oh!

Uh oh!

Amyr14 Apr 15, 2024

s-udhaya
Apr 15, 2024

Replies: 1 comment 2 replies

Amyr14
Apr 15, 2024

s-udhaya Apr 15, 2024
Author