Data leakage issue when using NeighborSampler? #5059
Unanswered
Dennis-Tsai
asked this question in
Q&A
Replies: 1 comment 2 replies
-
This is the expected behaviour. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
While using NeighborSampler, we set our selected
train_id
andvalid_id
(without overlapping) as the node_idx for creating the mini-batch training dataloaders as followed:However, once we looped through the train_loader as shown below, the
n_id
, the nodes involved in the training computation, have some overlapping with ourval_idx
:The valid/test node data can be accessed in training process. I wonder if it is kinda data leakage problem?
Please let me know if I misunderstood any part. Thanks!
Beta Was this translation helpful? Give feedback.
All reactions