Advice on dataset from Chereda et al 2019 #2287

mdanb · 2021-03-21T16:13:08Z

mdanb
Mar 21, 2021

I am playing around with the data and code provided by this paper, titled "Utilizing Molecular Network Information via Graph Convolutional Neural Networks to Predict Metastatic Event in Breast Cancer." (this is not important, but the code is actually provided by a follow up paper to the one mentioned here. Just thought I'd mention it to avoid confusion). Basically, the authors have gene expression data for cancer patients that exhibited / did not exhibit metastasis after a period of 5 years and trained a GNN model to classify them as such. The idea is to "project" the expression data on a gene interaction graph (such as a PPI graph) and to associate a graph with each patient. Each node of a given graph represents a gene and contains the expression value of that gene for the patient represented by that graph.

Now, the code the authors used is mostly adapted from the "famous" and "old" paper by Defferard et al 2016, "Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering." Defferard provides the code for that paper here.

The code by Defferard is "old" in the sense that it uses Tensorflow v1, and also does not use any graph libraries since I don't think there were any at that time. So I wanted to try to train a model on the dataset using newer code / libraries like Pytorch geometric (and Pytorch obviously). Now, the models that I built / used from more recent papers are all facing the same problem: they keep predicting the majority class (PS: there is no class imbalance, it's like 59%, 41%). So I keep getting an accuracy of 59%. What baffles me is that Defferard's model is able to learn and actually gets around 82% accuracy.
I tried for example using the code from the paper "DeepGCNs: Can GCNs Go as Deep as CNNs?." Result: predicts majority class only. I tried building my own model using Pytorch geometric, and I tried quite a few convolution types (including ChebConv, which is what Defferard originally implements):

class GNN(torch.nn.Module):
    def __init__(self, input_dim, output_dim, config, edge_attr, edge_index, device):
        super(GNN, self).__init__()
        self.pooling_method = config['pooling_method']
        hidden_dim = config['hidden_layer_size']
        self.num_layers = config['num_layers']
        self.dropout = config['dropout_rate']
        self.convs = torch.nn.ModuleList()
        self.conv_type = config['conv_type']
        self.edge_attr = edge_attr
        self.edge_index = edge_index.to(device)
        self.device = device
        self.convs.append(self.build_conv_model(input_dim, hidden_dim, self.conv_type))

        self.node_layer_norms = torch.nn.ModuleList()

        for _ in range(self.num_layers - 1):
            self.node_layer_norms.append(torch.nn.LayerNorm(hidden_dim))
            self.convs.append(self.build_conv_model(hidden_dim, hidden_dim, self.conv_type))

        self.post_mp = torch.nn.Sequential(
            torch.nn.Linear(hidden_dim, hidden_dim), torch.nn.ReLU(), torch.nn.Dropout(self.dropout),
            torch.nn.Linear(hidden_dim, output_dim), torch.nn.ReLU())

    def build_conv_model(self, input_dim, hidden_dim, conv_type):
        if (conv_type == 'GAT'):
            return GATConv(in_channels=input_dim, out_channels=hidden_dim)
        elif (conv_type == 'GIN'):
            return GINConv(torch.nn.Sequential(torch.nn.Linear(input_dim, hidden_dim),
                                  torch.nn.ReLU(), torch.nn.Linear(hidden_dim, hidden_dim)))
        elif (conv_type == 'ChebConv'):
            return ChebConv(in_channels=input_dim, out_channels=hidden_dim, K=8)
    
def forward(self, data):
        x, batch = data.x, data.batch
        edge_index = self.edge_index

        for i in range(self.num_layers):
            
            x = self.convs[i](x, edge_index)
            x = F.dropout(x, p=self.dropout, training=self.training)
            if not i == self.num_layers - 1:
                x = self.node_layer_norms[i](x)

        if (self.pooling_method == "mean"):
            x = global_mean_pool(x, batch)
        elif (self.pooling_method == "add"):
            x = global_add_pool(x, batch)

            x = self.post_mp(x)

        return F.log_softmax(x, dim=1)

    def loss(self, pred, label):
        return F.nll_loss(pred, label)

Result: predicts majority class only

Also, I noticed that Defferard's model is the only one that uses coarsening + pooling between layers, which it does via Graclus. So I tried adding that to Deeper GCN's Res+ block (commented out anything that has to do with edge features, since the data does not have edge features in my case):

def forward(self, input_batch):

        x = input_batch.x
        edge_index = input_batch.edge_index
        #edge_attr = input_batch.edge_attr
        batch = input_batch.batch

        h = self.node_features_encoder(x)
        #edge_attr = torch.zeros(edge_index.shape[1]).reshape(-1,1)#.to('cuda:0')
        #edge_emb = self.edge_encoder(edge_attr)
        if self.block == 'res+':

            #h = self.gcns[0](h, edge_index, edge_emb)
            h = self.gcns[0](h, edge_index)

            for layer in range(1, self.num_layers):
                h1 = self.norms[layer - 1](h)
                h2 = F.relu(h1)
                h2 = F.dropout(h2, p=self.dropout, training=self.training)
               # h = self.gcns[layer](h2, edge_index, edge_emb) + h
                h = self.gcns[layer](h2, edge_index) + h
                g = max_pool(graclus(edge_index), Data(x=h, edge_index=edge_index, batch=batch))
                h = g.x
                edge_index = g.edge_index
                batch = g.batch

Again, no difference. I'm quite confident the problem does not have to do with the way I input the data / data processing because I checked that multiple times. Of course, I tried playing around with hyperparameters as well, especially the learning rate, but nothing came out of that.

I'm kinda stumped at this point. Was wondering if anyone has any pointers.

Answered by rusty1s

Mar 22, 2021

This is hard to say actually. I see that the TF Code you linked also performs regularization and exponential LR decay. It also transforms input node features before inputting them to the GNN. Furthermore, this might also be induced by different weight initializations.

Sadly, I cannot give you clear guidance on what might be the cause of this :(

View full answer

rusty1s · 2021-03-22T08:18:42Z

rusty1s
Mar 22, 2021
Maintainer

This is hard to say actually. I see that the TF Code you linked also performs regularization and exponential LR decay. It also transforms input node features before inputting them to the GNN. Furthermore, this might also be induced by different weight initializations.

Sadly, I cannot give you clear guidance on what might be the cause of this :(

1 reply

mdanb Mar 22, 2021
Author

hmmm I tried doing an "ablation study" on the linked code e.g not normalizing features, not using LR decay etc...
In all cases the model is still able to learn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Advice on dataset from Chereda et al 2019 #2287

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Advice on dataset from Chereda et al 2019 #2287

Uh oh!

Uh oh!

mdanb Mar 21, 2021

Replies: 1 comment · 1 reply

Uh oh!

rusty1s Mar 22, 2021 Maintainer

Uh oh!

Uh oh!

mdanb Mar 22, 2021 Author

mdanb
Mar 21, 2021

Replies: 1 comment 1 reply

rusty1s
Mar 22, 2021
Maintainer

mdanb Mar 22, 2021
Author