Heterogeneous Graph with no features #4408

Zytrus · 2022-04-04T13:45:00Z

Zytrus
Apr 4, 2022

Hello,
I am new to GNNs and am currently working on creating a Heterogeneous GNN on graph-level predictions. I've seen many other discussions as well, but I didn't quite come across the same problem as I currently have.
My problem is that I have a graph with 3 node types and 5 edge types without features on any of them. I'd like to have features that consider the structural information and the types into consideration:

data = HeteroData()

data['process'].x = ...
data['socket'].x = ...
data['file'].x = ...

data['process','is_parent_of','process'].edge_index = ...
data['process','writes','file'].edge_index = ...
data['file','read_by','process'].edge_index = ...
data['process','connects_to', 'socket'].edge_index = ...
data['socket','sends_to','process'].edge_index = ...

I have no features for the nodes in my dataset. The only feature I could give them is either a 1 for each, or the node degree, but as I've read in many papers the node degree is inefficient especially for heterogeneous graphs, because they're usually very similar and don't differentiate between different kinds of edges.
So my question would be how to give them features?

Giving features with pre-processing sounds reasonable, but how can you apply pre-processing, when the dataset is not initialized?
A little help would be very much appreciated!

Another question is how batching the data works, so that each batch represents a graph in HeteroData(). Do you have to write a batch function, or how does batch() know, where the edges of Graph1 ends and Graph2 starts?

Best regards, Zytrus

Answered by rusty1s

Apr 4, 2022

This is tricky. In general, it's hard to get any better than using node degree statistics :)

You can also use MetaPath2Vec but this does not really work in an inductive learning scenario where you want to apply your model on unseen graphs. An additional alternative is to lift transforms.LocalDegreeProfile to the heterogeneous graph case.

For mini-batching, it works equivalent as in a homogeneous scenario, except that now you have a batch vector for each node type. You can then use global_pooling operators for each node type separately.

View full answer

rusty1s · 2022-04-04T14:51:24Z

rusty1s
Apr 4, 2022
Maintainer

This is tricky. In general, it's hard to get any better than using node degree statistics :)

You can also use MetaPath2Vec but this does not really work in an inductive learning scenario where you want to apply your model on unseen graphs. An additional alternative is to lift transforms.LocalDegreeProfile to the heterogeneous graph case.

For mini-batching, it works equivalent as in a homogeneous scenario, except that now you have a batch vector for each node type. You can then use global_pooling operators for each node type separately.

15 replies

rusty1s Apr 10, 2022
Maintainer

This is not supported at the moment. You may need to add dummy node types for these graphs (e.g., by setting data[node_type].x = torch.empty((0, num_features), dtype=torch.float)).

Zytrus Apr 11, 2022
Author

This error appears when I have data[node_type].x = torch.empty((0, 1), dtype=torch.float) for the empty features:

  File "C:\Users\chenx\Documents\Privacy\SystemCallGraph_node_degree.py", line 244, in process        
    data, slices = self.collate(hetero_list)
  File "C:\Users\chenx\miniconda3\lib\site-packages\torch_geometric\data\in_memory_dataset.py", line 112, in collate
    data, slices, _ = collate(
  File "C:\Users\chenx\miniconda3\lib\site-packages\torch_geometric\data\collate.py", line 85, in collate
    value, slices, incs = _collate(attr, values, data_list, stores,
  File "C:\Users\chenx\miniconda3\lib\site-packages\torch_geometric\data\collate.py", line 147, in _collate
    value = torch.cat(values, dim=cat_dim or 0, out=out)
RuntimeError: Tensors must have same number of dimensions: got 1 and 2

And 'int' object is not callable appears again when I have data[node_type].x = torch.empty( 0, dtype=torch.float ). So now I have dummy values for each graph, do you have a hunch what this 'int' error could represent?

Thank you for the help so far!

Best regards, Zytrus

Zytrus Apr 11, 2022
Author

The error shows me this line of code:

slices = cumsum([value.size(cat_dim or 0) for value in values])

Here value.size, which is an int is called like a function with (cat_dim or 0), could that be a bug?

rusty1s Apr 11, 2022
Maintainer

IMO, this may only happen in case this attribute is a tensor in one example while it might be an int in another. Can you confirm that this is not the caee? Otherwise, please try to construct a simple reproducible example and I am happy to take a look.

Zytrus Apr 12, 2022
Author

This is where my mistake was thank you! I didn't know the 'Array type' had impact on the processing, I thought Tensor and Numpy would do the trick. Now it's running through thank you very much!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Heterogeneous Graph with no features #4408

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 15 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Heterogeneous Graph with no features #4408

Uh oh!

Uh oh!

Zytrus Apr 4, 2022

Replies: 1 comment · 15 replies

Uh oh!

rusty1s Apr 4, 2022 Maintainer

Uh oh!

rusty1s Apr 10, 2022 Maintainer

Uh oh!

Uh oh!

Zytrus Apr 11, 2022 Author

Uh oh!

Zytrus Apr 11, 2022 Author

Uh oh!

rusty1s Apr 11, 2022 Maintainer

Uh oh!

Zytrus Apr 12, 2022 Author

Zytrus
Apr 4, 2022

Replies: 1 comment 15 replies

rusty1s
Apr 4, 2022
Maintainer

rusty1s Apr 10, 2022
Maintainer

Zytrus Apr 11, 2022
Author

Zytrus Apr 11, 2022
Author

rusty1s Apr 11, 2022
Maintainer

Zytrus Apr 12, 2022
Author