Advice on normalisation #3583
-
Hi! I am seeking advice from experts here on the best strategy to solve this problem. I have a series of data objects, each of them having a property y which is a real number. I have built a model to predict y for each graph. The graphs can have an arbitrary number of nodes and edges between them, and the nodes themselves can belong to different types, identified by their initial feature vectors, while edges have a single feature dependent on the euclidean distance between the linked nodes. My goal is to use a number of graph convolutional layers and later feed the resulting node embeddings into a fully-connected neural network to fit y for each graph out of a database. I guess this is all fairly standard. Node and edge features are normalised within [-1,1]. My question is: should one normalise y so that its values are contained within some interval, as node and edge features are? Apparently this is always desirable in conventional neural networks, but after a few trials, normalising y in my model above seems to bring no advantage in terms of ease of convergence or ability of the model to fit the data. I wonder if anyone here has any advice on the best way to proceed in the case of graph NNs? Thanks in advance. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
This is super hard to tell. From my experience, normalizing labels can improve performance, but is never the deciding factor of whether your model is able to fit the data or not. However, normalizing the targets has the advantage that you can use a final non-linearity, e.g., |
Beta Was this translation helpful? Give feedback.
This is super hard to tell. From my experience, normalizing labels can improve performance, but is never the deciding factor of whether your model is able to fit the data or not. However, normalizing the targets has the advantage that you can use a final non-linearity, e.g.,
sigmoid
, to push model outputs into the desired interval. Since it looks like your model is not able to fit the data regardless of final normalization, there might be other reasons for this. Feel free to post your architecture and your task so I can take a look :)