Skip to content
Discussion options

You must be logged in to vote
  1. You can read your Parquet files in batches and convert them to the desired output format, but PyG currently expects that both node feature matrix x and edge connectivity edge_index fits into memory. I think it is still an open research/engineering problem on how to best achieve this without storing everything in RAM. For example, you can input your data into a graph database (and sample from there), or you could create mini-batches beforehand and store them on disk (the Pinterest approach). An alternative is to just fit edge_index into memory (for efficient sampling), and make use of memory-mapped I/O to query node features from disk.
  2. Distributed training is easily doable via PyTorch Lig…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by jake-rbh
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants