You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I am developing a dataset in which each sample is a subgraph of a much larger graph. The graphs are heterogeneous and contain edges of different types and feature lengths. There are overlaps between subgraphs and generating and storing all of them would end up using too much memory (even on disk).
I have a mapping from a sample index to all the indices in the larger graph that should be included in the subgraph. All the edges between the selected nodes should also be kept. Currently I just pass the node indices to the subgraph method of the HeteroData class which then creates the sample, i.e. the subgraph.
The dataset I am working on extends the InMemoryDataset and overrides the get method to load up the required larger graph (the same way it's done in the standard InMemoryDataset) and then creates and returns the requested subgraph on the fly using the .subgraph method of HeteroData. This can give me around 50 samples per second on a single CPU.
I wanted to check if this is a valid approach or is there a better way of achieving this? Is there an existing Sampler that I could use or modify for this use case that would be faster than what I described above?
I want to iterate through this dataset using a DataLoader with multiple workers to speed it up further. In case I use the version of InMemoryDataset described above, would each worker have to load its own copy of the dataset into memory or can they use a single instance of the data?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I am developing a dataset in which each sample is a subgraph of a much larger graph. The graphs are heterogeneous and contain edges of different types and feature lengths. There are overlaps between subgraphs and generating and storing all of them would end up using too much memory (even on disk).
I have a mapping from a sample index to all the indices in the larger graph that should be included in the subgraph. All the edges between the selected nodes should also be kept. Currently I just pass the node indices to the subgraph method of the HeteroData class which then creates the sample, i.e. the subgraph.
The dataset I am working on extends the InMemoryDataset and overrides the get method to load up the required larger graph (the same way it's done in the standard InMemoryDataset) and then creates and returns the requested subgraph on the fly using the .subgraph method of HeteroData. This can give me around 50 samples per second on a single CPU.
I wanted to check if this is a valid approach or is there a better way of achieving this? Is there an existing Sampler that I could use or modify for this use case that would be faster than what I described above?
I want to iterate through this dataset using a DataLoader with multiple workers to speed it up further. In case I use the version of InMemoryDataset described above, would each worker have to load its own copy of the dataset into memory or can they use a single instance of the data?
Thank you for your advice!
Beta Was this translation helpful? Give feedback.
All reactions