Multiple datasets and batch setting for graph-based deep learning model #8810
Replies: 2 comments 3 replies
-
Sorry if I am misunderstanding: Do you want to train 5 individual models and then use their ensemble to do the final prediction? Do you want to share the backbone across these different models? |
Beta Was this translation helpful? Give feedback.
3 replies
-
I closed this discussion since it looks too dependent on others' opinions, not a discussion about the PyG-related warning or errors. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Dear PyG community,
Greetings, always thank you for your effort in updating the package.
Although I asked briefly here, I'm still ambiguous about my logic.
It will be helpful to me if anyone gives feedback or direction.
Before starting, I apologize for the messy sentences and figures :(
My objective is link prediction(i.e., edge classification) using a backbone network and multiple datasets containing only node features and binary edge labels.
For the reproducibility, I make the example datasets.
Note that
edge_index
in these datasets is not for the message passing.I want to make one end-to-end model learning the meaningful patterns in each dataset and predict links when I get new node features.

Fortunately, I did it under a single dataset with 5-fold cross-validation.
Then, I simply guessed the model for the multiple datasets as below. All classification models have the same GAT structure.

However, there were some questions about the logic.
Q1. If I should allocate the GAT model per dataset, how should I call the dataset under the 5-fold CV?
In the previous trial, I tried to use
DataLoader
orBatch.from_data_list()
which returned the same result to get the multiple datasets.But it's difficult to combine
DataLoader
and 5-fold CV since I already importedLinkNeighborLoader
for edge classification and it would be conflicted. For a dataset, I ran the below code:Q2. Assume that I successfully call the multiple datasets with a 5-fold CV. Then, how should I declare the model per dataset?
I considered using the PyTorch utils, such as
nn.Modules
orfor loop
in similar case1, 2, or 3, but it looks hard because of the GPU memory.Moreover, I'm not sure if is it correct to call the model in the for loop like the above code. Usually, the model is declared in the outside of the training process (I guess).
Q3. I guess that there should be a lighter version for my task since all datasets share the same backbone network as the blow figure. If it is possible, can you give me any example cases?

I think that it is much harder to construct but more similar to my task.
(If this is impossible, please let me know.)
Since I'm trying to fix my code and am afraid of the unclear idea, example codes are not perfect.
Thank you so much for reading this long question and have a nice day!
Beta Was this translation helpful? Give feedback.
All reactions