Representation learning using Graphsage on Heterogenous Graphs #8889

13bmartens · 2024-02-09T11:11:30Z

13bmartens
Feb 9, 2024

Hi Team,
thank you for the great work on PyG!

I am currently working with a custom heterogenous dataset with different node types in the area of firmographics (describing companies and their hierarchies).

The three main node types are company sites, domestic company entities, and global company entities. These are described both by node properties (employee count, yearly turnover) as well as connections between each other, as well as to a country (country - [is in]-> region) and industry (Site -[has ISIC]-> ISIC Code - [is in]-> ISIC Group -[is in]-> ISIC Division).

Using this graph, I want to learn an embedding that places similar company nodes close together in the embedding space. I would like this learning to happen inductively as unseen companies should be embedded without retraining.

I was able to build the graph, use a LinkNeighborLoader, and define a GraphSage-based GNN.

The thing I am struggling with is defining a proper loss function. I am currently relying on generating negative samples between two node types like done in the example (Company Site and ISIC Code) using the LinkNeighborLoader and this loss calculation:

    pred = model(sampled_data, sampled_data["site", "has_isic", "isic"].edge_label_index)
    ground_truth = sampled_data["site", "has_isic", "isic"].edge_label
    loss = F.binary_cross_entropy_with_logits(pred, ground_truth)

I am getting good results for the Company Site embedding but the embeddign of other nodes does not reflect their proximity in the graph.

The original GraphSage Paper mentions using random walks for this purpose, I could not find an example of doing this for heterogenous graphs.

How can I train my model on more than one edge type? Are there any other approaches possible with PyG?

Thank you aready for your insights!

rusty1s · 2024-02-09T12:11:58Z

rusty1s
Feb 9, 2024
Maintainer

I think what you do is fully correct. Another unsupervised heterogeneous approach would be via infomax (see the hetero folder for an example). I am not sure what you mean by other nodes do not reflect their proximity though. Do you mean nodes of other node types? This would be expected since they are not trained to do so.

1 reply

13bmartens Feb 9, 2024
Author

I could only find infomax_inductive.py which is outside the hetero folder, could you point me to the right file?

Do you mean nodes of other node types? This would be expected since they are not trained to do so.

Yes, exactly, nodes of types different to ISIC and Site. How do I change this? I would like them to be.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Representation learning using Graphsage on Heterogenous Graphs #8889

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Representation learning using Graphsage on Heterogenous Graphs #8889

Uh oh!

13bmartens Feb 9, 2024

Replies: 1 comment · 1 reply

Uh oh!

rusty1s Feb 9, 2024 Maintainer

Uh oh!

Uh oh!

13bmartens Feb 9, 2024 Author

13bmartens
Feb 9, 2024

Replies: 1 comment 1 reply

rusty1s
Feb 9, 2024
Maintainer

13bmartens Feb 9, 2024
Author