-
Hello all I am trying to predict house prices using the following model, trained on a heterogenous graph. class HeteroGNN(torch.nn.Module):
def __init__(self, metadata, hidden_channels, out_channels, num_layers):
super().__init__()
self.convs = torch.nn.ModuleList()
for _ in range(num_layers):
conv = HeteroConv({edge_type: GraphConv((-1, -1), hidden_channels) for edge_type in data.edge_types
}, aggr='sum')
self.convs.append(conv)
self.lin = Linear(hidden_channels, out_channels)
def forward(self, x_dict, edge_index_dict):
for conv in self.convs:
x_dict = conv(x_dict, edge_index_dict, edge_weight_dict=data.edge_weight_dict)
x_dict = {key: F.relu(x) for key, x in x_dict.items()}#leaky_relu#torch.sigmoid
return self.lin(x_dict['Property']) I have scaled the y between 0 and 1 along with all of my other continuous x's The loss successfully reduces to c0.25 mse after a 100 epochs but I have a negative R2 score on the train, val and test masks. I understand that the model may still perform poorly on the val and test sets, but the negative train R2 is surprising considering the reducing loss. If the model was awful wouldn't the loss not reduce? I have also constructed a tabular dataset with the same features to check their predictive value. I know, not a direct comparison but this removed the concern that the features are useless. A cross validated gradient boosting machine performs at 0.75 r2. After looking at the GNN model predictions I can see that it is producing some negative outputs which surprised me as I didn't think this was possible with relu activation function? Are there any other options for output layers and activation functions when undertaking regression with targets between 0 and 1? I have looked at sigmoid but always thought this was for multi-categorical classification. Thanks in advance for any help! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Note that your final linear transformation does not have any activation or clamping, so you should either try to apply |
Beta Was this translation helpful? Give feedback.
Note that your final linear transformation does not have any activation or clamping, so you should either try to apply
sigmoid
or clamping (clamp(0, 1)
) to your final predictions.