Heterogeneous GNN Inductive Learning #7607

Rico2000 · 2023-06-19T06:33:55Z

Rico2000
Jun 19, 2023

Hey,

I have a question regarding heterogeneous data. In my dataset, I have transactional data that consists of various nodes such as customers, transactions (customer-customer), devices, etc. These transactions have a temporal aspect and were performed at a specific point in the past. The objective is to perform node classification on the transactions, while avoiding the prediction of other nodes.

I have implemented the train-test split based on the temporal axis to enable inductive learning. However, I'm wondering how to handle the other nodes, as they can also change over time. For instance, new customers or devices may be added. In the example of heterogeneous data provided on GitHub (https://github.com/pyg-team/pytorch_geometric/blob/master/examples/hetero/to_hetero_mag.py), the train-test split is only applied to the "paper" nodes. In my case, it would be beneficial to perform a train-test split for other nodes as well, ensuring that no information from the test set influences the training process. Otherwise, the graph would contain customers who do not yet exist. Is it possible to implement this, and if so, how can it be done?

Answered by rusty1s

Jun 20, 2023

Sorry if my answer was not clear enough. In particular, for point (1) you don't want to create a mask for every node type, but you want to create separate data objects for training, validation, and testing. Each of these data objects then only contains visible information up to this point in time.

If you want to use temporal sampling from point (2), then the idea is to operate on a single data object, and let temporal sampling take care of avoiding data leakage. That is, all nodes have a time attribute, and temporal sampling will then only sample nodes that have a timestamp less than or equal to the seed timestamp.

View full answer

rusty1s · 2023-06-19T08:03:37Z

rusty1s
Jun 19, 2023
Maintainer

There are two options how you can achieve this:

You write your own temporal splitting logic, for which you take care to remove all future leaking timestamps in all types.
You rely on the temporal sampling routine integrated into PyG. See the time and input_time arguments in NeighborLoader.

0 replies

Rico2000 · 2023-06-19T08:29:00Z

Rico2000
Jun 19, 2023
Author

Thanks for the fast response!
To clarify my example, I have 9 months of training data and 3 months of test data. Currently, I have adjusted the train_mask and val_mask to split the node type "Transaction" based on the 9 and 3-month periods, respectively. The other node types do not have a mask. Would it be sufficient to add the "Time" attribute to all nodes, or do I need to create a mask for the other node types (Customer, Device, etc.) as well?

From my understanding of your response, I gather that I need to create a mask for all node types.

2 replies

rusty1s Jun 20, 2023
Maintainer

Sorry if my answer was not clear enough. In particular, for point (1) you don't want to create a mask for every node type, but you want to create separate data objects for training, validation, and testing. Each of these data objects then only contains visible information up to this point in time.

If you want to use temporal sampling from point (2), then the idea is to operate on a single data object, and let temporal sampling take care of avoiding data leakage. That is, all nodes have a time attribute, and temporal sampling will then only sample nodes that have a timestamp less than or equal to the seed timestamp.

Answer selected by Rico2000

Rico2000 Jun 21, 2023
Author

okay i think i got the point thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Heterogeneous GNN Inductive Learning #7607

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Heterogeneous GNN Inductive Learning #7607

Uh oh!

Rico2000 Jun 19, 2023

Replies: 2 comments · 2 replies

Uh oh!

rusty1s Jun 19, 2023 Maintainer

Uh oh!

Uh oh!

Rico2000 Jun 19, 2023 Author

Uh oh!

rusty1s Jun 20, 2023 Maintainer

Uh oh!

Rico2000 Jun 21, 2023 Author

Rico2000
Jun 19, 2023

Replies: 2 comments 2 replies

rusty1s
Jun 19, 2023
Maintainer

Rico2000
Jun 19, 2023
Author

rusty1s Jun 20, 2023
Maintainer

Rico2000 Jun 21, 2023
Author