How to traverse each node and efficiently sample the edges of its subgraphs? #8677
-
My data is like this: Data(edge_index=[2, 621260], path_id=[621260], time=[621260], label=[621260], node_ids=[295939], edge_ids=[621260]) This is an isomorphic graph, each edge has a feature called path_id (values 0, 1, 2, 3, 4..), along with timestamp “time” information for each edge. I want to do a sampling like this: for each node, get the path_id of the first 5 edges of the first hop subgraph that occur in chronological order, and the second hop gets the path_id of the first 5 edges that occur in chronological order. The third hop gets the path_id of the first 5 edges that occur in chronological order. If there are not enough edges, use -1 instead of path_id. This way I get a path_id sequence per node. Then for my prediction model, I want to use the first k + n + s -1 path_id to predict the last path_id. I use netwokx to implement such sampling but it is slow. does pyg provide such a sampling function, or which existing function can I modify to efficiently achieve the above purpose? NeighborSampler seems to work, but I tried it and it doesn't seem to be able to list the edge sequence sampled by each central node? @rusty1s I really hope to get an answer from your busy schedule, thank you. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
I think your best bet would be to implement such a sampling strategy in C++ directly if efficiency is a concern. In PyG, you may be able to utilize the temporal sampling strategy of |
Beta Was this translation helpful? Give feedback.
I think your best bet would be to implement such a sampling strategy in C++ directly if efficiency is a concern. In PyG, you may be able to utilize the temporal sampling strategy of
NeighborLoader
to achieve this (e.g., by usingtemporal_strategy="last"
and inverting yourpath_id
such that edges you want to sample have a higher "timestamp").