-
Hi, I am working on node classification on a large graph that has
All these data are stored on S3 in sharded parquet files. Is there any best practice or pointers for loading and training on such large graph? For example:
|
Beta Was this translation helpful? Give feedback.
Answered by
rusty1s
Apr 1, 2022
Replies: 1 comment
-
|
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
jake-rbh
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
x
and edge connectivityedge_index
fits into memory. I think it is still an open research/engineering problem on how to best achieve this without storing everything in RAM. For example, you can input your data into a graph database (and sample from there), or you could create mini-batches beforehand and store them on disk (the Pinterest approach). An alternative is to just fitedge_index
into memory (for efficient sampling), and make use of memory-mapped I/O to query node features from disk.