Heterogeneous Graph #5999

1AngelUS · 2022-11-17T12:22:02Z

1AngelUS
Nov 17, 2022

Hi, I am working on Heterogeneous GNN. I have following queries regarding the same:

https://pytorch-geometric.readthedocs.io/en/latest/notes/heterogeneous.html In this link, the author has explained GNN model for Heterogeneous graphs through a diagram. Could you please explain me why there is one SAGEConv layer between x_paper and x_author nodes? I thought the output of the same SAGEConv layer which is used to calculate hidden features of paper node will be passed to the x_author nodes but they have added a seperate SAGEConv layer.
Also, could you please explain the working behind the NeighbourhoodLoader (how the batches are being created and does the function consider all the neighbors of the nodes in the current batch and then their neighbors (i.e., neighbors of neighbors of nodes). If yes then it can lead to all the nodes in the same batch. I am extremely sorry for such a vague explanation.

Thanks very much!

wsad1 · 2022-11-17T14:22:29Z

wsad1
Nov 17, 2022
Maintainer

In heterogenous graphs we usually have separate MessagePassing parameters (SAGEConv in this case ) for each edge type, this is because in general different node and edge types might have features of different shapes and types.
NeighborLoader works by running a breadth first search (with sampling) from the provided seed nodes (input_nodes). num_neighbors controls how many hops it samples from, for example if its [5,5] means sample 5 first hop neigbhours and from those neighbours again sample 5 nodes. Depending on the connectivity of the graph it might return a graph with all the nodes in a sample.

7 replies

zzzLemon Dec 1, 2022

Assuming you are using the model from this example (https://pytorch-geometric.readthedocs.io/en/latest/notes/heterogeneous.html#automatically-converting-gnn-models). In the first layer you'll have 2 SAGEConvs one for paper->author and another for author->paper (reverse), in the second layer too you'll have 2 SAGEConvs one for each edge type. Its ok to not have edge features layers like SAGEConv don't use edge features.

Hi, when there is no edge type and all edges are undirected, does it mean, only one SAGEConv is needed when the source and target nodes have the same type (e.g. paper->paper), and two SAGEConvs are needed if the source and target are different (e.g. paper->author)? Thanks.

1AngelUS Dec 2, 2022
Author

Yes, if there are no more than 1 edge type and node type then we don't need separate SAGEConv layers. It will work like in case of a homogeneous graph. Please correct me if I am wrong!

AmosDinh Oct 5, 2023

Hello, in the case we have author_related_author and rev_author_related_author, it is sensible to use the same Conv layer for both, right?
Thank you

rusty1s Oct 5, 2023
Maintainer

Yes, that works. Alternatively, you can merge the two edge types into one.

AmosDinh Oct 6, 2023

Ok thank you. As I am dealing with a graph with multiple edgetypes (some between the same and some between different entities), I rather keep both directions separate in the dataset, for consistency across edgetypes

1AngelUS · 2022-11-18T02:46:42Z

1AngelUS
Nov 18, 2022
Author

Also, if we have 2 node types ( papers and authors) and 2 edge types (paper -> author and reverse of these) and the objective is to classify the paper nodes, then how the figure (architecture) given in https://pytorch-geometric.readthedocs.io/en/latest/notes/heterogeneous.html would be different?

Could you please help me with this? I am not able to draw an architecture in this case by myself.

Thank you once again!

1 reply

wsad1 Nov 18, 2022
Maintainer

Refer to the answer above. Now since you have to predict only on paper node you can select the paper node from the output dictionary. Refer to these examples https://github.com/pyg-team/pytorch_geometric/blob/master/examples/hetero/hetero_conv_dblp.py, https://github.com/pyg-team/pytorch_geometric/blob/master/examples/hetero/to_hetero_mag.py.

1AngelUS · 2022-11-22T04:39:23Z

1AngelUS
Nov 22, 2022
Author

Thanks a lot for your kind replies! :)
I was earlier assuming one SAGEConv layer for each node type (not the edge type), as I thought my end goal is to learn node embeddings and to classify nodes.

13 replies

1AngelUS Dec 1, 2022
Author

Yes, now it is clear to me. I was confused as, ideally, I also thought a GNN layer should not transform features of all the node types. Thanks a lot! :)
Great!

Thanks very much for your kind and so quick replies! :)

pstrybol Jan 10, 2024

So if I understand correctly, if the source node type is paper and the destination node type is author, the features of author are passed to paper and aggregated to update the paper node. But what if the feature dimension of author is e.g. 500 and that of paper only 50?

rusty1s Jan 10, 2024
Maintainer

That's is supported by passing in a tuple of input features to the layer, e.g. SAGEConv((500, 50), 32).

pstrybol Jan 11, 2024

Thanks for your answer. I looked a little deeper at the sageconv layer and am I correct to assume that if you have (paper, to, author) edge, the representation of author is updated using the features of paper?
Also, if you don't include the reverse edge your representations of paper will not be updated using the author representations?

rusty1s Jan 11, 2024
Maintainer

That is correct.

1AngelUS · 2022-12-13T08:42:17Z

1AngelUS
Dec 13, 2022
Author

Hi, Hope you are doing good!

On the basis of our discussion, I have a couple of questions:

I noticed when I used ImbalancedSampler during training GNN model, the performance of the model is not better as to the model when we didn't use this function.
How to further improve the performance of model.

The task which I am working on is as follows:
I have 2 sets of nodes (let's say type 1 and type 2), and a lot of edges between type 1 and type 2. The data stats are:
#edges: 116828882, #nodes type 1: 15327758 and type 2: 5018597. I am trying to classify node type 2 either as 1 or 0 where #nodes in class 1 node type 2 are 77281 and #nodes in class 0 node type 2 are 4941316.

Currently I am just running the same model as in this link https://github.com/pyg-team/pytorch_geometric/blob/master/examples/hetero/hetero_conv_dblp.py. Request you to help me what can be done to improve further the F1 score. Currently, class 0 precision/recall are 0.98/1.00 and class 1 precision/recall are 0.40/0.03.

Thanks so much! :)

4 replies

rusty1s Dec 17, 2022
Maintainer

Sorry for the super late response. Sad to see that imbalanced sampling did not bring any gains. It is hard to tell what you can do to improve performance. Is your model overfitting or underfitting?

1AngelUS Dec 19, 2022
Author

No worries at all :)

I don't think the model is overfitting or underfitting. The loss from the beginning is very low (around 0.0746), so there is not much change after a few epochs. I stopped training it at 35 epochs as after these many epochs, the loss was changing by less than 0.0001.

rusty1s Dec 20, 2022
Maintainer

Does normalizing your features help?

data.x = (data.x - data.x.mean(dim=0, keepdim=True)) / data.std(dim=0, keepdim=True)

1AngelUS Dec 23, 2022
Author

yes, I am normalizing the features and still getting this poor performance. :(

1AngelUS · 2022-12-25T19:09:10Z

1AngelUS
Dec 25, 2022
Author

Yes, it could be because of imbalancing in the dataset. Could you please elaborate how I can oversample given that I would need to create node features as well as its corresponding edges.

…

On Fri, Dec 23, 2022 at 4:55 PM zzzLemon ***@***.***> wrote: Hi, Hope you are doing good! On the basis of our discussion, I have a couple of questions: 1. I noticed when I used ImbalancedSampler during training GNN model, the performance of the model is not better as to the model when we didn't use this function. 2. How to further improve the performance of model. The task which I am working on is as follows: I have 2 sets of nodes (let's say type 1 and type 2), and a lot of edges between type 1 and type 2. The data stats are: #edges: 116828882, #nodes type 1: 15327758 and type 2: 5018597. I am trying to classify node type 2 either as 1 or 0 where #nodes in class 1 node type 2 are 77281 and #nodes in class 0 node type 2 are 4941316. Currently I am just running the same model as in this link https://github.com/pyg-team/pytorch_geometric/blob/master/examples/hetero/hetero_conv_dblp.py. Request you to help me what can be done to improve further the F1 score. Currently, class 0 precision/recall are 0.98/1.00 and class 1 precision/recall are 0.40/0.03. Thanks so much! :) Is it because of the labels are imbalanced? i.e. nodes in class 0 type2 is much more than nodes in class 1 node type2, maybe you can try under/oversampling? Correct me if i understand your problem wrongly :) — Reply to this email directly, view it on GitHub <#5999 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/A2ZQPGBVNTYCYR4ELEAKYPLWOWD2BANCNFSM6AAAAAASDKCLTA> . You are receiving this because you authored the thread.Message ID: ***@***.*** com>

3 replies

rusty1s Dec 27, 2022
Maintainer

The most easiest way would be to re-weight your loss function according to class distribution, i.e. define the weight argument of the BinaryCrossEntropy function. Would that work in your case?

1AngelUS Dec 27, 2022
Author

Sure, I can try this quickly. Thanks once again for your kind suggestions! :)

1AngelUS Jan 3, 2023
Author

Hi, I did try this but doesn't work. I think what you said is correct, its a data issue. Need to think of feature engineering and nodes sampling.

zzzLemon · 2023-01-02T21:25:07Z

zzzLemon
Jan 2, 2023

I think oversampling or downsampling is needed. I found an interesting paper that do oversampling for graph classification
Imbalanced Graph Classification via Graph-of-Graph Neural Networks
It's graph classification, maybe you can also do something similar for node classification. If you know SMOTE, you could add some noise to the minority nodes you have, and generate new Features for these nodes to handle the imbalance problem.

3 replies

1AngelUS Jan 3, 2023
Author

Hi, thanks very much for your suggestion. Yes I could use SMOTE for generating new node features. But what do you suggest I should do for generating the edge set of those new nodes?

zzzLemon Jan 3, 2023

I'm also new of gnn... Do you have edge features? If not then I guess you dont need to do anything for the edge

1AngelUS Jan 4, 2023
Author

By Edge set I mean, the connections of new nodes. We will have to create some edges (connections) for the new nodes (not the features).

1AngelUS · 2023-01-03T10:11:23Z

1AngelUS
Jan 3, 2023
Author

Also, could you please suggest me some better ways to encode the categorical features in an efficient way for my heterogeneous graph?

22 replies

1AngelUS Jan 19, 2023
Author

Hi, no its not working for me. How shall I install torch-scatter now?

That means other GNNs will be able to use GPU even if torch-scatter is CPU only? If yes, then can I keep the same CPU version if I am not using GATConv?

rusty1s Jan 19, 2023
Maintainer

Not all GNN layers make use of torch-scatter so these should run independent of torch-scatter. How did you install it in the first-place? Can you try to follow the "Pip Wheel" installation guide for installation?

1AngelUS Jan 19, 2023
Author

Oh Okay! Sure, will follow "Pip Wheel" installation guide as you suggested. Thanks very much.

Though, it means if I stick for example with SAGEConv then there is no need to re-install anything and it will still be able to use GPU during training, right?
Also, If I create a new environment and install the PyG version > 1.18, I can directly install using conda (since pip installation was time consuming because of all the SSL certificate errors I was getting). Will all these codes still work fine (https://github.com/pyg-team/pytorch_geometric/blob/master/examples/hetero/hetero_conv_dblp.py for an example and GATConv also) with PyG>1.8?

rusty1s Jan 20, 2023
Maintainer

Yes
Which PyG version are you referring to? For heterogeneous graph support you will need PyG > 2.0. Otherwise, all APIs are stable.

1AngelUS Jan 23, 2023
Author

Great! Yes, I am using PyG>2.0. :)

Thanks very much! :)

1AngelUS · 2023-01-30T08:52:25Z

1AngelUS
Jan 30, 2023
Author

Hi, I am not sure if the code I have written is correct and hence I intend to request you if you could please guide me if the following code is error free. Thanks very much! :)

this is loading the data

merc_x, merc_mapping = load_merc_csv()
crds_x, crds_mapping = load_crds_csv()
edge_index = load_edge_csv(
    src_index_col='CARD_NUM',
    src_mapping=crds_mapping,
    dst_index_col='MMH_ID',
    dst_mapping=merc_mapping,
)
y0, y1, yall = load_labels(crds_mapping) `loading labels (y0: class 0, y1: class 1 and yall: of whole data. All these are 2D arrays, 1st column being the mapped id of that sample and 2nd column being the label i.e., 0 or 1.`

creating train and test masks

n = crds_x.shape[0]
idx0=list(y0[:,0])
idx1=list(y1[:,0])
random.shuffle(idx0)
random.shuffle(idx1)
train_mask_idx0 = idx1[:int(len(y0)*0.8)]
train_mask_idx1 = idx0[:int(len(y1)*0.9)] #train : 5L label 0 and 4K label 1 
train_mask_idx = torch.tensor(train_mask_idx0+train_mask_idx1, dtype=int)
test_mask_idx = torch.tensor(idx0[int(len(y0)*0.8):]+idx1[int(len(y1)*0.9):],dtype=int)#+y_no[:,0])
train_mask = torch.zeros(n); test_mask = torch.zeros(n)
train_mask.scatter_(0, train_mask_idx, 1)
test_mask.scatter_(0, test_mask_idx, 1)
train_mask = train_mask.type(torch.bool)
test_mask = test_mask.type(torch.bool)
yall=sorted(yall,key=lambda x: (x[0]))

creating hetero graph and normalizing:

data = HeteroData()
data['c'].x = torch.tensor(crds_x).float() #np.random.randint(1, 2, size=(100000, 2))).float()
data['m'].x = torch.tensor(merc_x).float() #np.random.randint(1, 2, size=(len(merc), 2))).float()
data['c'].y= torch.tensor(np.array(yall)[:,1]).type(torch.FloatTensor) #np.random.randint(0,2, size=(len()))).type(torch.LongTensor)
data['c'].train_mask=train_mask #torch.tensor(np.random.randint(1,2, size=(555590)))
data['c'].test_mask=test_mask #torch.tensor(np.random.randint(1,2, size=(555590)))
data['c', 'uses', 'm'].edge_index = torch.tensor(edge_index)

for node_type in data.node_types:
    x = data[node_type].x
    x = (x - x.mean(dim=0, keepdim=True)) / x.std(dim=0, keepdim=True)
    data[node_type].x = x

creating train and val loadeds:

data=T.ToUndirected()(data)
train_input_nodes = ('c', data['c'].train_mask)
val_input_nodes = ('c', data['c'].test_mask)
kwargs = {'batch_size': 1024, }#'num_workers': 6, 'persistent_workers': True}

train_loader = NeighborLoader(data, num_neighbors=[30,30], shuffle=True,
                              input_nodes=train_input_nodes, **kwargs)
val_loader = NeighborLoader(data, num_neighbors=[30,30],
                            input_nodes=val_input_nodes, **kwargs)

model:

class GNN(torch.nn.Module):
    def __init__(self, hidden_channels, out_channels):
        super().__init__()
        self.conv1 = SAGEConv((-1, -1), hidden_channels)
        self.conv2 = SAGEConv((-1, -1), hidden_channels)
        self.lin2 = Linear(-1, hidden_channels,weight_initializer='glorot')
        self.linear = Linear(hidden_channels*2, 1,
                                  bias=False, weight_initializer='glorot')

    def forward(self, x, edge_index):
        x1 = self.conv1(x.float(), edge_index.long()).relu()
        x2 = self.conv2(x1, edge_index.long()).relu()
        x = self.linear(torch.cat((self.lin2(x), x2), dim=1))
        return x.reshape((x.shape[0])),x2

model = GNN(hidden_channels=256, out_channels=1)
model = to_hetero(model, data.metadata(), aggr='mean')
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

def train():
    model.train()
    total_examples = total_loss = 0
    for batch in train_loader:
        optimizer.zero_grad()
        batch_size = batch['c'].batch_size
        out,embedding = model(batch.x_dict, batch.edge_index_dict)
        loss = nn.BCEWithLogitsLoss(reduction='mean')(out['c'][:batch_size],
                               batch['c'].y[:batch_size])
        loss.backward()
        optimizer.step()
        total_examples += batch_size
        total_loss += float(loss) * batch_size
    return total_loss / total_examples


@torch.no_grad()
def test_score(loader):
    model.eval()
    total_examples = total_correct = 0
    for batch in loader:
        batch = batch.to(device, 'edge_index')
        batch_size = batch['c'].batch_size
        out,_ = model(batch.x_dict, batch.edge_index_dict)
        loss = nn.BCEWithLogitsLoss(reduction='mean')(out['c'][:batch_size],
                               batch['c'].y[:batch_size]) 
       return loss
    
test_loss=[]
train_loss=[]
for epoch in range(1, 101):
    loss = train()
    train_loss.append(loss)
    print('epoch:', epoch, 'loss: ', loss)
    val_loss = test_score(val_loader)
    print(f'val_loss: {val_loss:.4f}')
    name='model_'+str(epoch)+'.pt'
    if epoch%5==0:
        torch.save(model, name)

Test code:

Loading the new data, normalizing it, creating hetero graph (as I did in training file above)

**loading the saved model:** 
model = torch.load('model_100.pt').to('cpu')

scoring on new data:

from scipy.special import softmax
@torch.no_grad()
def test(data):
    model.eval()
    pred1,_ = model(data.x_dict, data.edge_index_dict)
    pred1=pred1['c']
    y_pred1=data['c'].y#[mask]
    return y_pred1, pred1
y_true pred1 = test(data)
pred1=torch.sigmoid(pred1)
pred1=np.array(pred1)
pred_copy=pred1.copy()

computing recall/precision for top 1% and 5% predictions

y_norm_prob=pred_copy
mapping = dict((v, k) for k, v in crds_mapping.items())
crdid=[]
for i in range(len(y_norm_prob)):
    crdid.append(mapping[i])
yp=pd.DataFrame(y_norm_prob.reshape((len(y_norm_prob))),columns=['predictions'])
cardid=pd.DataFrame(crdid,columns=['card_num'])

y1=pd.DataFrame(np.array(yall)[:,1],columns=['labeltt'])
final=pd.concat([cardid, yp, y1], axis=1)
final=final.sort_values('predictions',ascending=False)

final['pred'] = 0
th={}
for i in [int(len(final)*0.01),int(len(final)*0.05)]:
    threshold=final.iloc[i].predictions
    th[i]=threshold
    final.iloc[np.arange(0,i),[3]]= 1
    print(classification_report(final['labelt'].values,final['pred'].values, digits=4))

12 replies

1AngelUS Feb 7, 2023
Author

Hi @rusty1s

I need to get the embeddings out for both the types of nodes (c and m).

This is the forward function
def forward(self, x, edge_index):
x1 = self.conv1(x.float(), edge_index.long()).relu()
x2 = self.conv2(x1, edge_index.long()).relu()

    x = self.linear(torch.cat((self.lin2(x), x2), dim=1)
    return x.reshape((x.shape[0])), x1, x2

Case 1: I am not loading data in batches and passing the entire graph to the saved model which is trained on c nodes. Is the following script correct if I need to get the m node embeddings:

pred1,x1, x2= model(data.x_dict, data.edge_index_dict)
x1=x1['m']
x2=x2['m']

Case 2: I am passing data in batches made on the basis of c nodes
train_loader = NeighborLoader(data, num_neighbors=[20,10]*2, input_nodes=('c', data['c'].test_mask), **kwargs)

and scoring as following:

@torch.no_grad()
def test():
    model.eval()
    i=0
    total_examples = total_loss = 0
    for batch in train_loader:
        batch_size = batch['c'].batch_size
        p,x1,x2 = model(batch.x_dict, batch.edge_index_dict)

How shall I get the m nodes embeddings** in this case? As the batches are made regarding the c nodes but my aim is to get the embeddings of m nodes.

NeighborLoader #4819 As discussed e.g., in this thread, I am also trying to make Neighborloader inductive and thus leveraging the data.subgraph.

train_loader = NeighborLoader(data.subgraph({
    'c': data['c'].train_mask,
    'm': torch.arange(data['m'].num_nodes)}), num_neighbors=[30, 15], shuffle=True,
                              **kwargs)
val_loader = NeighborLoader(data.subgraph({'c': torch.arange(data['c'][data['c'].test_mask]),
    'm': torch.arange(data['m'].num_nodes)}), num_neighbors=[30, 15],
                            input_nodes=val_input_nodes, **kwargs)

It keeps throwing not enough memory error. Am I going wrong somewhere?

rusty1s Feb 8, 2023
Maintainer

If you want to get the m nodes' embedding, this looks all good. You probably want to only use x1 as m nodes don't get updated in x2. But there might exist duplicated embeddings if you run this via NeighborLoader since the embeddings of m depend on the c node under consideration.

Where does the OOM gets thrown?

1AngelUS Feb 8, 2023
Author

Thank you so much for your reply!

Sorry, I didn't understand :(

Could you please elaborate the duplicate embeddings part too.
I am not able to pass the entire graph to the model so I am using NeighborLoader and I am confused how to map m embeddings to their IDs since seed nodes in NeighborLoader are the c nodes. (particularly, how to get m embeddings in Case 2 of above ques)
When I run the following code, it throws memory error: (but if I run without data.subgraph it runs fine)

train_loader = NeighborLoader(data.subgraph({
    'c': data['c'].train_mask,
    'm': torch.arange(data['m'].num_nodes)}), num_neighbors=[30, 15], shuffle=True, input_nodes=('c', data['c'].train_mask)
                              **kwargs)

rusty1s Feb 8, 2023
Maintainer

For every iteration of neighborhood sampling, you will sample a bunch of m nodes for every c node, so if you simply return the embeddings and concatenate over iterations, your actual embedding size will be larger than the number of m nodes.
A simple trick is to add something like data['m'].n_id = torch.arange(data['m'].num_nodes), so that you maintain the original IDs of sampled m nodes.
Can you share some error log?

1AngelUS Feb 8, 2023
Author

Thanks so much! :)
I understand now. So basically, if I am using NeighborLoader, it will return the embeddings in the order of sampling of m nodes which could be repeating depending upon the number of c nodes it is connected to.

This is the log file (for data.subgraph)

{
"name": "RuntimeError",
"message": "[enforce fail at C:\cb\pytorch_1000000000000\work\c10\core\impl\alloc_cpu.cpp:81] data. DefaultCPUAllocator: not enough memory: you tried to allocate 755222014577808 bytes.",
"stack": "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[1;31mRuntimeError\u001b[0m Traceback (most recent call last)\n\u001b[1;32mc:\Users\e135581\Downloads\cluster_gnn - Cards_new_2.ipynb Cell 22\u001b[0m in \u001b[0;36m<cell line: 22>\u001b[1;34m()\u001b[0m\n\u001b[0;32m 11\u001b[0m kwargs \u001b[39m=\u001b[39m {\u001b[39m'\u001b[39m\u001b[39mbatch_size\u001b[39m\u001b[39m'\u001b[39m: \u001b[39m1024\u001b[39m\u001b[39m*\u001b[39m\u001b[39m10\u001b[39m, }\u001b[39m#'num_workers': 6, 'persistent_workers': True}\u001b[39;00m\n\u001b[0;32m 13\u001b[0m \u001b[39m# train_loader = HGTLoader(data, num_samples=[1024] * 4, shuffle=True,\u001b[39;00m\n\u001b[0;32m 14\u001b[0m \u001b[39m# input_nodes=train_input_nodes, **kwargs)\u001b[39;00m\n\u001b[0;32m 15\u001b[0m \u001b[39m# val_loader = HGTLoader(data, num_samples=[1024] * 4,\u001b[39;00m\n\u001b[1;32m (...)\u001b[0m\n\u001b[0;32m 19\u001b[0m \u001b[39m# train_loader = NeighborLoader(data, num_neighbors=[30]*2,\u001b[39;00m\n\u001b[0;32m 20\u001b[0m \u001b[39m# input_nodes=train_input_nodes, sampler=sampler, kwargs)\u001b[39;00m\n\u001b[1;32m---> 22\u001b[0m train_loader \u001b[39m=\u001b[39m NeighborLoader(data\u001b[39m.\u001b[39;49msubgraph({\n\u001b[0;32m 23\u001b[0m \u001b[39m'\u001b[39;49m\u001b[39mc\u001b[39;49m\u001b[39m'\u001b[39;49m: data[\u001b[39m'\u001b[39;49m\u001b[39mc\u001b[39;49m\u001b[39m'\u001b[39;49m]\u001b[39m.\u001b[39;49mtrain_mask,\n\u001b[0;32m 24\u001b[0m \u001b[39m'\u001b[39;49m\u001b[39mm\u001b[39;49m\u001b[39m'\u001b[39;49m: torch\u001b[39m.\u001b[39;49marange(data[\u001b[39m'\u001b[39;49m\u001b[39mm\u001b[39;49m\u001b[39m'\u001b[39;49m]\u001b[39m.\u001b[39;49mnum_nodes)}), num_neighbors\u001b[39m=\u001b[39m[\u001b[39m30\u001b[39m, \u001b[39m15\u001b[39m], shuffle\u001b[39m=\u001b[39m\u001b[39mTrue\u001b[39;00m,\n\u001b[0;32m 25\u001b[0m \u001b[39m\u001b[39m\u001b[39m\u001b[39mkwargs)\n\nFile \u001b[1;32mc:\Users\e135581\.conda\envs\test_env\lib\site-packages\torch_geometric\data\hetero_data.py:576\u001b[0m, in \u001b[0;36mHeteroData.subgraph\u001b[1;34m(self, subset_dict)\u001b[0m\n\u001b[0;32m 573\u001b[0m \u001b[39mif\u001b[39;00m src \u001b[39mnot\u001b[39;00m \u001b[39min\u001b[39;00m subset_dict \u001b[39mor\u001b[39;00m dst \u001b[39mnot\u001b[39;00m \u001b[39min\u001b[39;00m subset_dict:\n\u001b[0;32m 574\u001b[0m \u001b[39mcontinue\u001b[39;00m\n\u001b[1;32m--> 576\u001b[0m edge_index, _, edge_mask \u001b[39m=\u001b[39m bipartite_subgraph(\n\u001b[0;32m 577\u001b[0m (subset_dict[src], subset_dict[dst]),\n\u001b[0;32m 578\u001b[0m \u001b[39mself\u001b[39;49m[edge_type]\u001b[39m.\u001b[39;49medge_index,\n\u001b[0;32m 579\u001b[0m relabel_nodes\u001b[39m=\u001b[39;49m\u001b[39mTrue\u001b[39;49;00m,\n\u001b[0;32m 580\u001b[0m size\u001b[39m=\u001b[39;49m(\u001b[39mself\u001b[39;49m[src]\u001b[39m.\u001b[39;49mnum_nodes, \u001b[39mself\u001b[39;49m[dst]\u001b[39m.\u001b[39;49mnum_nodes),\n\u001b[0;32m 581\u001b[0m return_edge_mask\u001b[39m=\u001b[39;49m\u001b[39mTrue\u001b[39;49;00m,\n\u001b[0;32m 582\u001b[0m )\n\u001b[0;32m 584\u001b[0m \u001b[39mfor\u001b[39;00m key, value \u001b[39min\u001b[39;00m \u001b[39mself\u001b[39m[edge_type]\u001b[39m.\u001b[39mitems():\n\u001b[0;32m 585\u001b[0m \u001b[39mif\u001b[39;00m key \u001b[39m==\u001b[39m \u001b[39m'\u001b[39m\u001b[39medge_index\u001b[39m\u001b[39m'\u001b[39m:\n\nFile \u001b[1;32mc:\Users\e135581\.conda\envs\test_env\lib\site-packages\torch_geometric\utils\subgraph.py:135\u001b[0m, in \u001b[0;36mbipartite_subgraph\u001b[1;34m(subset, edge_index, edge_attr, relabel_nodes, size, return_edge_mask)\u001b[0m\n\u001b[0;32m 131\u001b[0m node_idx_j \u001b[39m=\u001b[39m torch\u001b[39m.\u001b[39mzeros(node_mask[\u001b[39m1\u001b[39m]\u001b[39m.\u001b[39msize(\u001b[39m0\u001b[39m), dtype\u001b[39m=\u001b[39mtorch\u001b[39m.\u001b[39mlong,\n\u001b[0;32m 132\u001b[0m device\u001b[39m=\u001b[39mdevice)\n\u001b[0;32m 133\u001b[0m node_idx_i[node_mask[\u001b[39m0\u001b[39m]] \u001b[39m=\u001b[39m torch\u001b[39m.\u001b[39marange(node_mask[\u001b[39m0\u001b[39m]\u001b[39m.\u001b[39msum()\u001b[39m.\u001b[39mitem(),\n\u001b[0;32m 134\u001b[0m device\u001b[39m=\u001b[39mdevice)\n\u001b[1;32m--> 135\u001b[0m node_idx_j[node_mask[\u001b[39m1\u001b[39m]] \u001b[39m=\u001b[39m torch\u001b[39m.\u001b[39;49marange(node_mask[\u001b[39m1\u001b[39;49m]\u001b[39m.\u001b[39;49msum()\u001b[39m.\u001b[39;49mitem(),\n\u001b[0;32m 136\u001b[0m device\u001b[39m=\u001b[39;49mdevice)\n\u001b[0;32m 137\u001b[0m edge_index \u001b[39m=\u001b[39m torch\u001b[39m.\u001b[39mstack(\n\u001b[0;32m 138\u001b[0m [node_idx_i[edge_index[\u001b[39m0\u001b[39m]], node_idx_j[edge_index[\u001b[39m1\u001b[39m]]])\n\u001b[0;32m 140\u001b[0m \u001b[39mif\u001b[39;00m return_edge_mask:\n\n\u001b[1;31mRuntimeError\u001b[0m: [enforce fail at C:\cb\pytorch_1000000000000\work\c10\core\impl\alloc_cpu.cpp:81] data. DefaultCPUAllocator: not enough memory: you tried to allocate 755222014577808 bytes."
}

1AngelUS · 2023-02-08T13:35:38Z

1AngelUS
Feb 8, 2023
Author

I am creating a 2 layered GNN.

class GNN(torch.nn.Module):
    def __init__(self, hidden_channels, out_channels):
        super().__init__()
        self.conv1 = SAGEConv((-1, -1), hidden_channels)
        self.conv2 = SAGEConv((-1, -1), hidden_channels)
        self.lin2 = Linear(-1, hidden_channels,weight_initializer='glorot')
        self.linear = Linear(hidden_channels*2, 1,
                                  bias=False, weight_initializer='glorot')

This is the sampler
train_loader = NeighborLoader(data, num_neighbors=[20,10]*2, input_nodes=('c', data['c'].test_mask), **kwargs)

I am very confused as to how these 2 layers are structured. So, it's a hetero graph with c and m nodes. Thus, I will have sageconv1(m to c), sageconv2(m to c), sageconv1(c to m), and sageconv2(c to m). How these layers are being used to compute c and m Embeddings (both x1 and x2). Also, will these embeddings differ for each batch for m nodes?

![img1](https://user-images.githubusercontent.com/112396184/217561705-2d17c69e-dc9a-4813-92b0-fa4b07ad4a3d.jpg =250x250)

You said x2 of m won't be updated but I thought it will be. Suppose this (the image uploaded above) is the 2 hop nodes sampled with reference to c1 node. Now all these nodes will be in the batch. How we will compute c1 and m3 nodes (i.e., which layers will be used for each of these while computing x1 and x2 of c and m nodes).
Also, I understand if we use loader the m nodes might be repeating but this should be true wrt c nodes since we are sampling the neighboring nodes of m nodes as well (since we are computing 2 hops sampling).

6 replies

1AngelUS Feb 9, 2023
Author

Thanks a lot! I understand wrt to the full graph which is as follows:

Please guide me if the following logic makes sense, in case of the full graph (no sampling) :)

x1(c) = sageconv1(c to m), x1(m) = sageconv1(m to c) [using raw attributes of neighbors]
x2(c) = sageconv2(c to m), x1(m) = sageconv2(m to c) [using `x1` embeddings of neighbors]

But when I am using NeighborLoader, how does x1 and x2 are computed for a single computation graph returned by NeighborLoader. (E.g., how to compute embeddings in the above attached graph c1 and m3 nodes)? Specifically, I have doubts wrt the discussion in this thread.

2.1 This thread says,

SAGE1 works as the original paper where each layer only propagates wihtin a part of the subgraph:

1st prop: [298,327] --> [1325], [746,1347] --> [281]
2nd prop: [1325,281] --> [1000]

In this case, how are going to retrieve the x1 (in my understanding the final update will give x2 directly (it is having 2 hops information))?

2.2 Then, it says the following:

But SAGE2 with NeighborLoader in examples/reddit.py seems to have a different aggregation process where each layer propagates messages wihtin the entire subgraph:

1st prop: [298,327] --> [1325], [746,1347] --> [281],[1325,281] --> [1000]
2nd prop: [298,327] --> [1325], [746,1347] --> [281],[1325,281] --> [1000]

It seems understandable if, for all 3 operations in 1st prop, we are using the raw input attributes and then in 2nd prop we are using the updated embeddings from the 1st prop (not the ones calculated in the same prop, i.e., in 2nd prop 1st prop embeddings of [281, 1325, 281] are being used to computing 2nd prop embedding of 1000.) Is this understanding correct?
If this is correct (explanation in 2.2): then in 1st prop (assuming nodes 298, 327, 1347, 746, 1000 are c nodes and rest are m nodes, reference node is 1000), for computing [298,327] --> [1325] and [1325], [746,1347] we will use sageconv1(m->c) and for [1325,281] --> [1000] we will use sageconv1(c-<m). For prop2, [298,327] --> [1325] and [1325], [746,1347] we will use sageconv2(m->c) and for [1325,281] --> [1000] we will use sageconv2(c-<m).

I am sorry for asking stupid questions.

rusty1s Feb 10, 2023
Maintainer

I think your understanding is correct.

In your case, you would just return the last embeddings and the previous embeddings as part of forward, something like

x1 = conv(x, edge_index)
x2 = conv(x1, edge_index)
return x2, x1

This way, you can access the embeddings of c via x2 and m via x1. Does that make sense?

1AngelUS Feb 10, 2023
Author

What about x1 embeddings of c? This should also be getting trained, right? It should be in the loss computation graph and will be learned. So, can I access the embeddings of c via x1 and x2 and m via x1?

Does the explanations (in above post) specifically in point 1 and point 2.2 make sense?

rusty1s Feb 10, 2023
Maintainer

Yes, you can also make use of x1 for c but this is also integrated into x2.

Yes, all your points are correct IMO.

1AngelUS Feb 11, 2023
Author

Thanks a lot for patiently clearing all my doubts. :)

1AngelUS · 2023-02-28T08:46:02Z

1AngelUS
Feb 28, 2023
Author

Hi @rusty1s !

I also tried to use unsupervised methods to train embeddings. I took help from the codes available for link prediction and modified a bit to be able to train in mini batches and leverage the LinkLoader.

I had created the hetero data as follows:

HeteroData(
  c={
    x=[4472334, 53],
    node_id=[4472334]
  },
  m={
    x=[4894505, 49],
    node_id=[4894505]
  },
  (c, uses, m)={ edge_index=[2, 37648786] },
  (m, rev_uses, c)={ edge_index=[2, 37648786] }
)

Then I defined edge level splitting:

transform = T.RandomLinkSplit(
    num_val=0.1,
    num_test=0.1,
    disjoint_train_ratio=0.3,
    neg_sampling_ratio=2.0,
    add_negative_train_samples=False,
    edge_types=("c", "uses", "m"),
    rev_edge_types=("m", "rev_uses", "c"), 
)

train_data, val_data, test_data = transform(data)

Define Mini Batch Loaders:

from torch_geometric.loader import LinkNeighborLoader

# Define seed edges:
edge_label_index = train_data["c", "uses", "m"].edge_label_index
edge_label = train_data["c", "uses", "m"].edge_label

train_loader = LinkNeighborLoader(
    data=train_data,
    num_neighbors=[20, 10],
    neg_sampling_ratio=2.0,
    edge_label_index=(("c", "uses", "m"), edge_label_index),
    edge_label=edge_label,
    batch_size=1028,
    shuffle=True,
)

Created Hetero GNN:

from torch_geometric.nn import SAGEConv, to_hetero
import torch.nn.functional as F

class GNN(torch.nn.Module):
    def __init__(self, hidden_channels):
        super().__init__()

        self.conv1 = SAGEConv(hidden_channels, hidden_channels)
        self.conv2 = SAGEConv(hidden_channels, hidden_channels)

    def forward(self, x: Tensor, edge_index: Tensor) -> Tensor:
        x = F.relu(self.conv1(x, edge_index))
        x = self.conv2(x, edge_index)
        return x

class Classifier(torch.nn.Module):
    def forward(self, x_card: Tensor, x_merc: Tensor, edge_label_index: Tensor) -> Tensor:
        
        edge_feat_card = x_card[edge_label_index[0]]
        edge_feat_merc = x_merc[edge_label_index[1]]
        
        return (edge_feat_card * edge_feat_merc).sum(dim=-1)


class Model(torch.nn.Module):
    def __init__(self, hidden_channels):
        super().__init__()
        
        self.merc_lin = torch.nn.Linear(49, hidden_channels)
        self.card_lin = torch.nn.Linear(53, hidden_channels)

        self.card_emb = torch.nn.Embedding(data["c"].num_nodes, hidden_channels)
        self.merc_emb = torch.nn.Embedding(data["m"].num_nodes, hidden_channels)

        self.gnn = GNN(hidden_channels)

        self.gnn = to_hetero(self.gnn, metadata=data.metadata())

        self.classifier = Classifier()

    def forward(self, data: HeteroData) -> Tensor:
        x_dict = {
          "c": self.card_lin(data["c"].x) +self.card_emb(data["c"].node_id),
          "m": self.merc_lin(data["m"].x) + self.merc_emb(data["m"].node_id),
        } 
        x_dict = self.gnn(x_dict, data.edge_index_dict)
        pred = self.classifier(
            x_dict["c"],
            x_dict["m"],
            data["c", "uses", "m"].edge_label_index,
        )

        return pred, x_dict
model = Model(hidden_channels=64)

Training the model:

import tqdm
import torch.nn.functional as F

print(f"Device: '{device}'")

model = model.to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.0002)

for epoch in range(1, 40):
    total_loss = total_examples = 0
    for sampled_data in tqdm.tqdm(train_loader):
        optimizer.zero_grad()

        sampled_data.to(device)
        pred, _ = model(sampled_data)

        ground_truth = sampled_data["c", "uses", "m"].edge_label
        loss = F.binary_cross_entropy_with_logits(pred, ground_truth)

        loss.backward()
        optimizer.step()
        total_loss += float(loss) * pred.numel()
        total_examples += pred.numel()
        name='model_lp_de42_'+str(epoch)+'.pt'
        if epoch%5==0:
            torch.save(model, name)
    print(f"Epoch: {epoch:03d}, Loss: {total_loss / total_examples:.4f}")

During training the loss is coming as follows:

Epoch: 001, Loss: -31023560873470660.0000
Epoch: 002, Loss: -1380511624976094720.0000
Epoch: 003, Loss: -12601339292034727936.0000
Epoch: 004, Loss: -60312948105948127232.0000

I couldn't understand what is wrong in the implementation. Could you please guide me why model could be behaving like this?

Thanks very much!

11 replies

rusty1s Mar 1, 2023
Maintainer

Sorry for not being clear. You would need to drop the + 1 here.

1AngelUS Mar 1, 2023
Author

Thank you! :)
I tried this and it worked.

1AngelUS Mar 1, 2023
Author

A follow up ques: My aim is to learn node embeddings in an unsupervised manner and use these embeddings for node classification task.
IMO if I remove the embedding layers (which are learning new initial features for each node in the graph), the model is inductive. Could you also please confirm if the same code can be used for inductive learning? I have separate train and test sets.

rusty1s Mar 1, 2023
Maintainer

Yes, in this case you would need to drop the Embedding layers. Otherwise, you cannot test on new data.

1AngelUS Mar 1, 2023
Author

Thank you so much! :)

1AngelUS · 2023-03-07T08:17:09Z

1AngelUS
Mar 7, 2023
Author

Hi!

Is there any way I can use a distribution to sample neighbors in neighborloader. I intend to mimic this (page 4, equation 4) behavior for neigbor sampling.

2 replies

rusty1s Mar 7, 2023
Maintainer

Currently no, but it is on our agenda :)

1AngelUS Mar 7, 2023
Author

Great to hear this. Thank you for this wonderful package :)

1AngelUS · 2023-03-16T10:47:11Z

1AngelUS
Mar 16, 2023
Author

Hi @rusty1s

I declared a scalar learnable weight which is multiplied to the embeddings before running a classifier.
self.w1 = torch.nn.Parameter(torch.randn(1))#, requires_grad=True)

It is throwing : KeyError: 'w1'

Could you guide me where I am going wrong?

18 replies

1AngelUS Mar 30, 2023
Author

I have installed PyG using master version (PyG =2.4.0).

I keep getting the following error (when I try to use captum):
ModuleNotFoundError: No module named 'captum')

rusty1s Mar 30, 2023
Maintainer

Run pip install captum. It is an optional dependency in PyG.

1AngelUS Mar 30, 2023
Author

Thanks! It got resolved but another error came:

ImportError: cannot import name 'Explainer' from 'torch_geometric.nn

rusty1s Mar 30, 2023
Maintainer

from torch_geometric.explain import Explainer

1AngelUS Apr 2, 2023
Author

Thank you! It worked!

1AngelUS · 2023-04-03T08:36:49Z

1AngelUS
Apr 3, 2023
Author

Hi @rusty1s!

In my Hetero Data (2 node types and edges going from node type 1 to node type 2), I have very few positive label nodes of type 1 (my target class, which I want to classify).

That's why I was planning to train the model by combining supervised and unsupervised losses together.

My doubt was:
I am using NeighborLoader since the dataset has 10 million of nodes and 100 million of edges.

train_loader = NeighborLoader(data, num_neighbors=[20,10], shuffle=True,
                              input_nodes=train_input_nodes, **kwargs)
val_loader = NeighborLoader(data, num_neighbors=[20, 10],
                            input_nodes=val_input_nodes, **kwargs)

For each batch I minimize the cross entropy loss for node type 1 labels.

Now, I would like to introduce an unsupervised loss along with classification loss.

Questions:

Does this make sense?
I was planning to do as mentioned in https://github.com/pyg-team/pytorch_geometric/discussions/3633 thread for each batch in trainloader (which I got from NeighborLoader). Will there be data leakage issue? It will still be inductive and I would be able to use the trained models on other out of time datasets, right?

6 replies

1AngelUS Apr 7, 2023
Author

Thanks very much for replying even on vacation. I was thinking of combining classification loss and link prediction loss (I had edited my previous comment).

Currently, for classification loss, I am running as follows:

train_loader = NeighborLoader(data, num_neighbors=[20,10], shuffle=True, input_nodes=train_input_nodes, **kwargs) val_loader = NeighborLoader(data, num_neighbors=[20, 10], input_nodes=val_input_nodes, **kwargs)

For each batch, I am minimizing the classification loss for node type 1.

**Now, I was thinking to do link prediction (LP) task for each batch in train_loader (without dividing the batch further into train/test/val sets for LP task). **

There won't be any leakage wrt node labels or edges (since I will be using negative sampling)?

rusty1s Apr 9, 2023
Maintainer

It's a bit tricky to train both node-level and link-level tasks end to end, since NeighborLoader will only generate subgraphs for a single node, not links. What you probably can do is combine both NeighborLoader and LinkNeighborLoader and train the resulting mini-batches simultaneously. I can't think of a cleaner way to implement this right now.

1AngelUS Apr 11, 2023
Author

Oh Yes!

But if I combine NeighborLoader with LinkNeighborLoader, the resulting mini-batches will have repeating cards (i.e., the same card might be present in more than 1 batch) and likewise the node supervised loss will be effected more by these types of nodes. Won't this be an issue?

rusty1s Apr 11, 2023
Maintainer

Mh, that is true. You might want to alpha a different weight on the node-level loss to account for that.

1AngelUS Apr 11, 2023
Author

Oh, got it! Thank you :)

1AngelUS · 2023-05-29T10:32:33Z

1AngelUS
May 29, 2023
Author

Hi @rusty1s!

I am working on a Hetero graph with 2 node types, c and m and creating batches using**** NeighborLoader where the seed nodes are c. Could you please guide me if in a single batch can we have repeated m nodes or not? Also, if the m nodes repeating will have the same embedding or not?

Thank you!!

13 replies

1AngelUS May 30, 2023
Author

Also, I keep getting this warning:
to_hetero_transformer.py:362: UserWarning: 'action' will be duplicated, but its parameters cannot be reset

Could you suggest if I need to do something regarding this?

rusty1s May 30, 2023
Maintainer

The super().__init__ call will call reset_parameters internally, but by then, update_lin is not defined yet. It isn't related to lazy initialization.

The warning seems to be related to something else. Do you have an example to reproduce?

1AngelUS May 31, 2023
Author

Thank you, @rusty1s!

I will try to share a reproducible code with you at the earliest. What happens is, sometimes it throws this warning and sometimes it doesn't. I didn't know if there is something wrong with the code.

1AngelUS Jun 1, 2023
Author

Hi @rusty1s !

Sharing the code below which throws the warning shared above . This is the test code not the training code.


import torch_geometric.transforms as T
from torch_geometric.datasets import OGB_MAG
from torch_geometric.nn import SAGEConv, to_hetero, GCNConv, GraphConv, GATConv
import torch.nn as nn
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
from torch.nn import Dropout
from torch_geometric.nn.dense.linear import Linear
from torch_geometric.nn import GCNConv, GATv2Conv

import argparse
import os.path as osp

import torch
import torch.nn.functional as F
from torch.nn import ReLU
from tqdm import tqdm

import torch_geometric.transforms as T
from torch_geometric.datasets import OGB_MAG
from torch_geometric.loader import HGTLoader, NeighborLoader
import torch_geometric
from torch_geometric.data import HeteroData
from torch_geometric.nn import Linear, SAGEConv, Sequential, to_hetero    


import warnings

import torch
from torch import Tensor

import torch_geometric.typing
# from torch_geometric.typing import Adj, SparseTensor, torch_sparse

from typing import List, Optional, Tuple, Union

import torch.nn.functional as F
from torch import Tensor
from torch.nn import LSTM

# from torch_geometric.nn.aggr import Aggregation, MultiAggregation
from torch_geometric.nn.conv import MessagePassing
from torch_geometric.nn.dense.linear import Linear
from torch_geometric.typing import Adj, OptPairTensor, Size, SparseTensor

from torch_geometric.utils import *

from typing import List, Optional, Tuple, Union

import torch.nn.functional as F
from torch import Tensor
from torch.nn import LSTM
from torch_geometric.nn.models import MLP

from torch_geometric.nn.conv import MessagePassing, SAGEConv
from torch_geometric.nn.dense.linear import Linear
from torch_geometric.typing import Adj, OptPairTensor, Size, SparseTensor


   
class SAGE_Conv(torch_geometric.nn.conv.SAGEConv):
    def __init__(self, in_channels=(-1,-1), out_channels=64, aggr = "sum",
        root_weight: bool = True, bias: bool = True, **kwargs):
        
        super(SAGE_Conv, self).__init__(in_channels=in_channels, out_channels=out_channels)#, aggr='sum') 
        self.in_channels = in_channels
        self.out_channels = out_channels
        aggr_out_channels = out_channels*3
        self.lin_l = Linear(aggr_out_channels, out_channels, bias=bias)
        self.update_lin0 = Linear(in_channels[0], out_channels, bias=bias)
        self.update_lin1 = Linear(in_channels[1], out_channels, bias=bias)
        self.reset_parameters()
        
    def reset_parameters(self):
        super().reset_parameters()
        if hasattr(self, 'update_lin0'):
            self.update_lin0.reset_parameters()
        if hasattr(self, 'update_lin1'):
            self.update_lin1.reset_parameters()
    
        self.lin_l.reset_parameters()
        if self.root_weight:
            self.lin_r.reset_parameters()

    def forward(self, x: Union[Tensor, OptPairTensor], edge_index: Adj,
                size: Size = None) -> Tensor:
        
        if isinstance(x, Tensor):
            x: OptPairTensor = (x, x)

        x = (self.act(self.update_lin0(x[0])), x[1])
        out = self.propagate(edge_index, x=x, size=size)

        x_r = (self.update_lin1(x[1])).relu()

        new_embedding = self.lin_l(torch.add(out, x_r))#,dim=1)
        return new_embedding #, self.update_lin0#.weight.numpy()



class Classifier(torch.nn.Module):
    def __init__(self, hidden_channels=64, out_channels=1):
        super().__init__()
        self.linear1 = Linear(-1, 128, weight_initializer='kaiming_uniform') #nn.Linear(hidden_channels*3, 256)

    def forward(self, x2):
        x3 = self.linear1(x2)#.relu()
        return x3.reshape((x3.shape[0]))


class GNN(torch.nn.Module):
    def __init__(self, hidden_channels, out_channels):
        super().__init__()
        self.conv1 = SAGE_Conv((-1, -1), hidden_channels)#, aggr=['mean','max'])
        self.conv2 = SAGE_Conv((-1, -1), hidden_channels)#, aggr=['mean','max'])
        # self.lin1 = Linear(-1, hidden_channels*1, weight_initializer='kaiming_uniform')
        
    def forward(self, x, edge_index):
        x1 = (self.conv1(x.float(), edge_index.long()))
        x1 = self.bn1(x1)
        x1=x1.relu()


        x2 = (self.conv2(x1, edge_index.long()))
        x2 = self.bn2(x2)
        x2=x2.relu()
        
        return x2



class Model(torch.nn.Module):
    def __init__(self, hidden_channels=64, out_channels=1):
        super().__init__()
        self.gnn = GNN(hidden_channels=hidden_channels, out_channels=1) 
        self.gnn = to_hetero(self.gnn, data.metadata(), aggr='mean').to(device)
        self.classifier = Classifier(hidden_channels, out_channels).to(device)
        
    def forward(self, x, edge_index):
        x2 = self.gnn(x, edge_index)
        out_cc = self.classifier(x2['c'])

        return out_cc

model = Model(hidden_channels=128, out_channels=1).to('cpu')

model.load_state_dict(torch.load('model.pt'))
model = model.to('cpu')

pred1, x1,x2= model(data.x_dict, data.edge_index_dict)

rusty1s Jun 1, 2023
Maintainer

I ran

import argparse
import os.path as osp
import warnings
from typing import List, Optional, Tuple, Union

import torch
import torch.nn as nn
import torch.nn.functional as F
from sklearn.metrics import classification_report, confusion_matrix
from torch import Tensor
from torch.nn import LSTM, Dropout, ReLU
from tqdm import tqdm

import torch_geometric
import torch_geometric.transforms as T
import torch_geometric.typing
from torch_geometric.data import HeteroData
from torch_geometric.datasets import OGB_MAG
from torch_geometric.loader import HGTLoader, NeighborLoader
from torch_geometric.nn import (
    GATConv,
    GATv2Conv,
    GCNConv,
    GraphConv,
    Linear,
    SAGEConv,
    Sequential,
    to_hetero,
)
from torch_geometric.typing import Adj, OptPairTensor, Size


class SAGE_Conv(torch_geometric.nn.conv.SAGEConv):
    def __init__(self, in_channels=(-1, -1), out_channels=64, aggr="sum",
                 root_weight: bool = True, bias: bool = True, **kwargs):

        super(SAGE_Conv, self).__init__(in_channels=in_channels,
                                        out_channels=out_channels)
        self.in_channels = in_channels
        self.out_channels = out_channels
        aggr_out_channels = out_channels * 3
        self.lin_l = Linear(aggr_out_channels, out_channels, bias=bias)
        self.update_lin0 = Linear(in_channels[0], out_channels, bias=bias)
        self.update_lin1 = Linear(in_channels[1], out_channels, bias=bias)
        self.reset_parameters()

    def reset_parameters(self):
        super().reset_parameters()
        if hasattr(self, 'update_lin0'):
            self.update_lin0.reset_parameters()
        if hasattr(self, 'update_lin1'):
            self.update_lin1.reset_parameters()

        self.lin_l.reset_parameters()
        if self.root_weight:
            self.lin_r.reset_parameters()

    def forward(self, x: Union[Tensor, OptPairTensor], edge_index: Adj,
                size: Size = None) -> Tensor:

        if isinstance(x, Tensor):
            x: OptPairTensor = (x, x)

        x = (self.act(self.update_lin0(x[0])), x[1])
        out = self.propagate(edge_index, x=x, size=size)

        x_r = (self.update_lin1(x[1])).relu()

        new_embedding = self.lin_l(torch.add(out, x_r))
        return new_embedding


class Classifier(torch.nn.Module):
    def __init__(self, hidden_channels=64, out_channels=1):
        super().__init__()
        self.linear1 = Linear(-1, 128, weight_initializer='kaiming_uniform')

    def forward(self, x2):
        x3 = self.linear1(x2)
        return x3.reshape((x3.shape[0]))


class GNN(torch.nn.Module):
    def __init__(self, hidden_channels, out_channels):
        super().__init__()
        self.conv1 = SAGE_Conv((-1, -1), hidden_channels)
        self.conv2 = SAGE_Conv((-1, -1), hidden_channels)

    def forward(self, x, edge_index):
        x1 = (self.conv1(x.float(), edge_index.long()))
        x1 = x1.relu()

        x2 = (self.conv2(x1, edge_index.long()))
        x2 = x2.relu()

        return x2


class Model(torch.nn.Module):
    def __init__(self, hidden_channels=64, out_channels=1):
        super().__init__()
        self.gnn = GNN(hidden_channels=hidden_channels, out_channels=1)
        self.gnn = to_hetero(
            self.gnn,
            metadata=(['x', 'y'], [('x', 'to', 'y'), ('y', 'to', 'x')]),
            aggr='mean',
        )
        self.classifier = Classifier(hidden_channels, out_channels)

    def forward(self, x, edge_index):
        x2 = self.gnn(x, edge_index)
        out_cc = self.classifier(x2['c'])

        return out_cc


model = Model(hidden_channels=128, out_channels=1)

and it works fine for me.

1AngelUS · 2023-06-29T08:08:43Z

1AngelUS
Jun 29, 2023
Author

Hi @rusty1s !

I am running some experiments on Heterogeneous graphs. I tried SAGEConv (using neighborloader since graph is very huge) which is giving decent score. When I tried GraphConv without neighborloader, it is also giving decent score but when I tried GraphConv (aggr set to 'mean') with neighborloader, it is giving very poor results. I thought it should work similar to sageconv. Could you please guide me regarding this issue.

Thank you!

9 replies

1AngelUS Jul 19, 2023
Author

Also, could you guide me if there is any example code of contrastive learning for large scale bipartite (or hetero) graphs?

rusty1s Jul 22, 2023
Maintainer

You can take a look at infomax and autoencoder examples.

1AngelUS Jul 24, 2023
Author

Thank you, @rusty1s ! :)

1AngelUS Aug 3, 2023
Author

Hi @rusty1s !

The dataset (edge file, node features file) I am working with is very large with 1B edges. Currently, I have to load all the files in dataframes and create a graph out of them. Afterwards, I train the GNN in batches.

But loading such a huge dataset in memory and creating a graph out of it requires lot of memory. Do you know if there is any algorithm/paper which creates a connected graph in batches.

Thank you!

rusty1s Aug 4, 2023
Maintainer

There exists various ways to do this. We are currently working on a distributed package which should let you be able to use multiple machines to process and train on this dataset.

1B edges should roughly be 15GB, so eventually you are able to hold it in CPU memory and then use our loaders to train the GNN in mini-batches.

1AngelUS · 2024-01-18T11:49:22Z

1AngelUS
Jan 18, 2024
Author

Hi @rusty1s !

My graph is a large graph and is heterogenous. I am using NeighborLoader to create mini batches which is sampling neighbors till 2 hops. I want to ask that in a batch, will there be unique nodes (of each type) or there could be repetitions since a particular node can be a neighbor of 2 or more than 2 nodes.

Thank you!!

1 reply

rusty1s Jan 21, 2024
Maintainer

By default, all nodes will be unique. This only changes if you enable the disjoint=True mode.

1AngelUS · 2024-03-21T13:14:31Z

1AngelUS
Mar 21, 2024
Author

Hi @rusty1s !

Hope you are doing good.

class GNN(torch.nn.Module):
    def __init__(self, hidden_channels=128, out_channels=1, num_heads=4, num_layers=2):
        super().__init__()
        self.lin_dict = torch.nn.ModuleDict()
        for node_type in data.node_types:
            self.lin_dict[node_type] = Linear(-1, hidden_channels)

        self.convs = torch.nn.ModuleList()
        for _ in range(num_layers):
            conv = HGTConv(hidden_channels, hidden_channels, data.metadata(),
                           num_heads, group='sum')
            self.convs.append(conv)

    def forward(self, x_dict, edge_index_dict):
        for node_type, x in x_dict.items():
            x_dict[node_type] = self.lin_dict[node_type](x).tanh_()
        
        x_f = []
        for conv in self.convs:
            x_dict = conv(x_dict, edge_index_dict)
            x_f.append(x_dict)
        return x_dict, x_f[0], x_f[1]

I am writing the above code to get the 1st and 2nd layer embeddings. Then on these 2 embeddings I am training a classifier to train the model.

Further, once the model is trained, then I am using these embeddings to train a separate tree-based model with both the layers' embeddings (this is for some task in which we are supposed to use the GNN embeddings along with some raw features, so both the models' training is separate. Only embeddings are used from GNN output.). My issue is, I am getting low accuracy on this tree-based model.

Can you help me if there is some mistake in considering these GNN layers' outputs as the embeddings? or should i consider adding activation function or if I am missing something else.

I am stuck since a long so thought of taking your help.

Thank you!

6 replies

1AngelUS Apr 3, 2024
Author

Hey! I ran as you said, and getting similar results.

1AngelUS Apr 3, 2024
Author

Also, I have another doubt:

my data looks like this:
HeteroData(
c={
x=[994624, 54],
y=[994624, 1],
n_id=[994624]
},
m={
x=[5819393, 48],
y=[5819393, 1],
n_id=[5819393],
train_mask=[5819393],
val_mask=[5819393],
val_mak=[5819393]
},
(c, uses, m)={ edge_index=[2, 68134534] },
(m, rev_uses, c)={ edge_index=[2, 68134534] }
)

But when I use neighborloader with batch size of 100K and 20K seed nodes, I get the following batch:
HeteroData(
c={
x=[180900, 54],
y=[180900, 1],
n_id=[180900]
},
m={
x=[2173793, 48],
y=[2173793, 1],
n_id=[2173793],
train_mask=[2173793],
val_mask=[2173793],
val_mak=[2173793],
batch_size=12176
},
(c, uses, m)={ edge_index=[2, 184290] },
(m, rev_uses, c)={ edge_index=[2, 7771172] }
)

Seed node type is 'm' and 20K seed nodes are there. I am sampling 50 neighbors at each hop. Having 2 hops.

My doubt is: is it possible to have so many merchants and so less (c, uses, m) edges. Also, is it possible to have less number of 2 hop edges i.e., (m, rev_uses, c) than (c, uses, m). Sorry for asking lame questions but I got confused with the batch structure.

rusty1s Apr 3, 2024
Maintainer

That looks correct to me. You start sampling from merchants (via the (c, uses, m) edge type), and for every c you sample merchants again (via the (m, rev_uses, c) edge type). So the number of edges in the second hop is much larger.

1AngelUS Apr 5, 2024
Author

Ohh, I thought, merchants would be sampled via (m, rev_uses, c) edge type and then cards would be sampled by (c, uses, m) edge type.

Is this understanding not correct?

rusty1s Apr 8, 2024
Maintainer

Actually it is the reverse. If we want to read out m as final node embedding, then this means we expect that messages are passed from c to m, which means that we sample neighbors by (c, uses, m) in the first hop.

Heterogeneous Graph #5999

Uh oh!

Uh oh!

Replies: 17 comments · 137 replies

Uh oh!

Uh oh!

wsad1 Nov 17, 2022 Maintainer

Uh oh!

Uh oh!

1AngelUS Dec 2, 2022 Author

Uh oh!

Uh oh!

rusty1s Oct 5, 2023 Maintainer

Uh oh!

Uh oh!

Uh oh!

1AngelUS Nov 18, 2022 Author

Uh oh!

wsad1 Nov 18, 2022 Maintainer

Uh oh!

1AngelUS Nov 22, 2022 Author

Uh oh!

1AngelUS Dec 1, 2022 Author

Uh oh!

Uh oh!

rusty1s Jan 10, 2024 Maintainer

Uh oh!

Uh oh!

rusty1s Jan 11, 2024 Maintainer

Uh oh!

Uh oh!

1AngelUS Dec 13, 2022 Author

Uh oh!

rusty1s Dec 17, 2022 Maintainer

Uh oh!

Uh oh!

1AngelUS Dec 19, 2022 Author

Uh oh!

rusty1s Dec 20, 2022 Maintainer

Uh oh!

1AngelUS Dec 23, 2022 Author

Uh oh!

1AngelUS Dec 25, 2022 Author

Uh oh!

rusty1s Dec 27, 2022 Maintainer

Uh oh!

Uh oh!

1AngelUS Dec 27, 2022 Author

Uh oh!

1AngelUS Jan 3, 2023 Author

Uh oh!

Uh oh!

1AngelUS Jan 3, 2023 Author

Uh oh!

Uh oh!

1AngelUS Jan 4, 2023 Author

Uh oh!

Uh oh!

Replies: 17 comments 137 replies

wsad1
Nov 17, 2022
Maintainer

1AngelUS Dec 2, 2022
Author

rusty1s Oct 5, 2023
Maintainer

1AngelUS
Nov 18, 2022
Author

wsad1 Nov 18, 2022
Maintainer

1AngelUS
Nov 22, 2022
Author

1AngelUS Dec 1, 2022
Author

rusty1s Jan 10, 2024
Maintainer

rusty1s Jan 11, 2024
Maintainer

1AngelUS
Dec 13, 2022
Author

rusty1s Dec 17, 2022
Maintainer

1AngelUS Dec 19, 2022
Author

rusty1s Dec 20, 2022
Maintainer

1AngelUS Dec 23, 2022
Author

1AngelUS
Dec 25, 2022
Author

rusty1s Dec 27, 2022
Maintainer

1AngelUS Dec 27, 2022
Author

1AngelUS Jan 3, 2023
Author

1AngelUS Jan 3, 2023
Author

1AngelUS Jan 4, 2023
Author