-
Good evening fellas. First of all, I want to clarify that I'm not a expert at all in pyg and its structure or conventions, so bear with me a little bit. Ok, to the issue: The DataI'm trying to build a custom dataset for research purposes. The data in question is kinda unconventional, haven't found anything similar to it in my researches. I want to represent a radiography dataset as an undirected graph where the vertices are LBP histograms (42 decimal features) and the edges values represent cosine similarity between these histograms. I want to test if there is any kind of advantage in representing this data in a graph format. The ProblemDue to the nature of the data, as expected, the amount of edges is huge. I'm talking millions or even a couple hundred millions. Being that a simple dataset (as the one I'm working on) contains some tens of thousands of images, naturally the maximum amount of edges will be this number squared. Using COO format, I managed to curb the memory problem a little bit, but some layers simply do not support sparse operations (e.g the minCUT pool layer). I looked into batching, but I'm having trouble understanding it. So, what I'm looking for is some guidance as to how to properly structure this data following torch conventions in a manner that my computer doesn't explode. Any tip or hint would be greatly appreciated. Thank you in advance. SpecsOS -> Fedora 39 Edit: System specs |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 6 replies
-
Is this a single graph or multiple graphs? Is this a node or graph classification dataset? You are right that some layers do not support this scale (such as dense pooling layers). Happy to give you more advice if you share some more details on what you are trying to do. |
Beta Was this translation helpful? Give feedback.
In this case, please take a look at
NeighborLoader
orLinkNeighborLoader
, which you can use to scale node-level or link-level tasks on a single graph.