The problem of loading large dataset such as ogbn-paper100M #5536
-
Hi! I want to load the ogbn-paper100M dataset(the total size is 56GB) and partition this graph with Metis method. But I found that the size of this graph is so big that my machine's memory can not load the entire graph for partitioning. Actually the feature vector of each vertex will occupy a large proportion of total memory. So I want to ask whether PYG can load the structure of graph and the feature vectors of vertices separately. With this, we can load the structure of graph at first, and perform the graph partitioning with Metis method. Then during the training stage we can load the feature vectors of vertices. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
Yeah, just loading the I don't think OGB supports this currently though, so maybe it is best to open an issue there? Alternatively, you should be able to create your own pre-processing script following the code of OGB. |
Beta Was this translation helpful? Give feedback.
Yeah, just loading the
edge_index
representation and applying METIS on top of it should work as well and should heavily reduce the memory requirements necessary to process and split the graph.I don't think OGB supports this currently though, so maybe it is best to open an issue there? Alternatively, you should be able to create your own pre-processing script following the code of OGB.