-
Hello, data = HeteroData()
data['process'].x = ...
data['socket'].x = ...
data['file'].x = ...
data['process','is_parent_of','process'].edge_index = ...
data['process','writes','file'].edge_index = ...
data['file','read_by','process'].edge_index = ...
data['process','connects_to', 'socket'].edge_index = ...
data['socket','sends_to','process'].edge_index = ... I have no features for the nodes in my dataset. The only feature I could give them is either a 1 for each, or the node degree, but as I've read in many papers the node degree is inefficient especially for heterogeneous graphs, because they're usually very similar and don't differentiate between different kinds of edges. Giving features with pre-processing sounds reasonable, but how can you apply pre-processing, when the dataset is not initialized? Another question is how batching the data works, so that each batch represents a graph in HeteroData(). Do you have to write a batch function, or how does batch() know, where the edges of Graph1 ends and Graph2 starts? Best regards, Zytrus |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 15 replies
-
This is tricky. In general, it's hard to get any better than using node degree statistics :) You can also use For mini-batching, it works equivalent as in a homogeneous scenario, except that now you have a |
Beta Was this translation helpful? Give feedback.
This is tricky. In general, it's hard to get any better than using node degree statistics :)
You can also use
MetaPath2Vec
but this does not really work in an inductive learning scenario where you want to apply your model on unseen graphs. An additional alternative is to lifttransforms.LocalDegreeProfile
to the heterogeneous graph case.For mini-batching, it works equivalent as in a homogeneous scenario, except that now you have a
batch
vector for each node type. You can then useglobal_pooling
operators for each node type separately.