Replies: 1 comment 3 replies
-
Standardization and pre-processing is currently we expect the user to handle. For example, you would first need to gather the mean = dataset.data.x.mean(dim=0, keepdim=True)
std = dataset.data.x.std(dim=0, keepdim=True) and then use it to re-scale your features. We can also add support for this similar to
|
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi PyG Community,
I was initially doing feature scaling and standardization (column-wise Minmax Scaler) of my node and edge attributes, during the
process()
method when creating myInMemoryDataset HeteroData
, when it was going to be just one data.pt object with a single graph, accessed by data[0].Now, i'm planning to split my data and use the larger Dataset class since the size is huge. So it is going to be a List of Heterodata objects MyDataset(30). This makes batching , loading and train-test split very easy for me. However my concern is
NormalizeFeatures
- does it row wise and not column wiseNormalizeScale
- doesn't seem like what i need. works on node positionsGCNNorm
- Not entirely sure , but it might be what I need based on few examples? Please let me know if it isAlso, looking at the
BaseTransform
description, it seems i can explicitly but for i was looking for a way to do it as a transform but consider all edges in all graphs for scaling feature values. Not sure if explicit does thatI'm not sure way i haven't seen a lot of this(maybe datasets are preprocessed), but is it not necessary to standardize your data for GNNs? Would love to know if it can be avoided and if Normalizing instead makes no difference.
Another thing I would like some confirmation on is
Custom Transforms
- Can I create a custom Transform ? For instance , if my edge_index is [2,1000] and edge_attr = [1000,45]Wanted to know if such a thing can be done through transforms , so that if i can create a new input for a data object, I can pass it through this like we do in sklearn and it takes care of dataprocessing
I would be grateful for any comments, suggestions or advice on these 2 issues
Beta Was this translation helpful? Give feedback.
All reactions