Replies: 17 comments 137 replies
-
|
Beta Was this translation helpful? Give feedback.
-
Also, if we have 2 node types ( papers and authors) and 2 edge types (paper -> author and reverse of these) and the objective is to classify the paper nodes, then how the figure (architecture) given in Could you please help me with this? I am not able to draw an architecture in this case by myself. Thank you once again! |
Beta Was this translation helpful? Give feedback.
-
Thanks a lot for your kind replies! :) |
Beta Was this translation helpful? Give feedback.
-
Hi, Hope you are doing good! On the basis of our discussion, I have a couple of questions:
The task which I am working on is as follows: Currently I am just running the same model as in this link Thanks so much! :) |
Beta Was this translation helpful? Give feedback.
-
Yes, it could be because of imbalancing in the dataset. Could you please
elaborate how I can oversample given that I would need to create node
features as well as its corresponding edges.
…On Fri, Dec 23, 2022 at 4:55 PM zzzLemon ***@***.***> wrote:
Hi, Hope you are doing good!
On the basis of our discussion, I have a couple of questions:
1. I noticed when I used ImbalancedSampler during training GNN model,
the performance of the model is not better as to the model when we didn't
use this function.
2. How to further improve the performance of model.
The task which I am working on is as follows: I have 2 sets of nodes
(let's say type 1 and type 2), and a lot of edges between type 1 and type
2. The data stats are: #edges: 116828882, #nodes type 1: 15327758 and type
2: 5018597. I am trying to classify node type 2 either as 1 or 0 where
#nodes in class 1 node type 2 are 77281 and #nodes in class 0 node type 2
are 4941316.
Currently I am just running the same model as in this link
https://github.com/pyg-team/pytorch_geometric/blob/master/examples/hetero/hetero_conv_dblp.py.
Request you to help me what can be done to improve further the F1 score.
Currently, class 0 precision/recall are 0.98/1.00 and class 1
precision/recall are 0.40/0.03.
Thanks so much! :)
Is it because of the labels are imbalanced? i.e. nodes in class 0 type2 is
much more than nodes in class 1 node type2, maybe you can try
under/oversampling? Correct me if i understand your problem wrongly :)
—
Reply to this email directly, view it on GitHub
<#5999 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A2ZQPGBVNTYCYR4ELEAKYPLWOWD2BANCNFSM6AAAAAASDKCLTA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***
com>
|
Beta Was this translation helpful? Give feedback.
-
I think oversampling or downsampling is needed. I found an interesting paper that do oversampling for graph classification |
Beta Was this translation helpful? Give feedback.
-
Also, could you please suggest me some better ways to encode the categorical features in an efficient way for my heterogeneous graph? |
Beta Was this translation helpful? Give feedback.
-
Hi, I am not sure if the code I have written is correct and hence I intend to request you if you could please guide me if the following code is error free. Thanks very much! :)
Test code: Loading the new data, normalizing it, creating hetero graph (as I did in training file above)
|
Beta Was this translation helpful? Give feedback.
-
I am creating a 2 layered GNN.
This is the sampler

|
Beta Was this translation helpful? Give feedback.
-
Hi @rusty1s ! I also tried to use unsupervised methods to train embeddings. I took help from the codes available for link prediction and modified a bit to be able to train in mini batches and leverage the LinkLoader.
During training the loss is coming as follows:
I couldn't understand what is wrong in the implementation. Could you please guide me why model could be behaving like this? Thanks very much! |
Beta Was this translation helpful? Give feedback.
-
Hi! Is there any way I can use a distribution to sample neighbors in neighborloader. I intend to mimic this (page 4, equation 4) behavior for neigbor sampling. |
Beta Was this translation helpful? Give feedback.
-
Hi @rusty1s I declared a scalar learnable weight which is multiplied to the embeddings before running a classifier. It is throwing : KeyError: 'w1' Could you guide me where I am going wrong? |
Beta Was this translation helpful? Give feedback.
-
Hi @rusty1s! In my Hetero Data (2 node types and edges going from node type 1 to node type 2), I have very few positive label nodes of type 1 (my target class, which I want to classify). That's why I was planning to train the model by combining supervised and unsupervised losses together. My doubt was:
For each batch I minimize the cross entropy loss for node type 1 labels. Now, I would like to introduce an unsupervised loss along with classification loss. Questions:
|
Beta Was this translation helpful? Give feedback.
-
Hi @rusty1s! I am working on a Hetero graph with 2 node types, c and m and creating batches using**** NeighborLoader where the seed nodes are c. Could you please guide me if in a single batch can we have repeated m nodes or not? Also, if the m nodes repeating will have the same embedding or not? Thank you!! |
Beta Was this translation helpful? Give feedback.
-
Hi @rusty1s ! I am running some experiments on Heterogeneous graphs. I tried SAGEConv (using neighborloader since graph is very huge) which is giving decent score. When I tried GraphConv without neighborloader, it is also giving decent score but when I tried GraphConv (aggr set to 'mean') with neighborloader, it is giving very poor results. I thought it should work similar to sageconv. Could you please guide me regarding this issue. Thank you! |
Beta Was this translation helpful? Give feedback.
-
Hi @rusty1s ! My graph is a large graph and is heterogenous. I am using NeighborLoader to create mini batches which is sampling neighbors till 2 hops. I want to ask that in a batch, will there be unique nodes (of each type) or there could be repetitions since a particular node can be a neighbor of 2 or more than 2 nodes. Thank you!! |
Beta Was this translation helpful? Give feedback.
-
Hi @rusty1s ! Hope you are doing good.
I am writing the above code to get the 1st and 2nd layer embeddings. Then on these 2 embeddings I am training a classifier to train the model. Further, once the model is trained, then I am using these embeddings to train a separate tree-based model with both the layers' embeddings (this is for some task in which we are supposed to use the GNN embeddings along with some raw features, so both the models' training is separate. Only embeddings are used from GNN output.). My issue is, I am getting low accuracy on this tree-based model. Can you help me if there is some mistake in considering these GNN layers' outputs as the embeddings? or should i consider adding activation function or if I am missing something else. I am stuck since a long so thought of taking your help. Thank you! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I am working on Heterogeneous GNN. I have following queries regarding the same:
https://pytorch-geometric.readthedocs.io/en/latest/notes/heterogeneous.html
In this link, the author has explained GNN model for Heterogeneous graphs through a diagram. Could you please explain me why there is one SAGEConv layer between x_paper and x_author nodes? I thought the output of the same SAGEConv layer which is used to calculate hidden features of paper node will be passed to the x_author nodes but they have added a seperate SAGEConv layer.Also, could you please explain the working behind the NeighbourhoodLoader (how the batches are being created and does the function consider all the neighbors of the nodes in the current batch and then their neighbors (i.e., neighbors of neighbors of nodes). If yes then it can lead to all the nodes in the same batch. I am extremely sorry for such a vague explanation.
Thanks very much!
Beta Was this translation helpful? Give feedback.
All reactions