Replies: 2 comments 10 replies
-
@Cow-Kite Hmm, it should not happen. Which dataset do you use ogbn-products or ogbn-mag? Did you turned on multithreading? Unfortunately, I don't know the details of your test environment. Maybe there is a communication delay between the machines? |
Beta Was this translation helpful? Give feedback.
5 replies
-
You might want to check the output of |
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello. I'm currently running pyg's distributed learning example code. (github: /examples/distributed/pyg/node_ogb_cpu.py)
Distributed learning is performed by dividing the graph into two and four partitions, respectively.
Through experiments, we found that the learning time per epoch on four nodes was more than twice that of the time on two nodes.
In theory, the epoch times should be the same, but why does this happen?
The number of epochs, mini-batch size, and all variables for distributed learning are the same.
We only tested with two and four nodes.
This is result on two node
This is result on four node
Thank you very much!
Beta Was this translation helpful? Give feedback.
All reactions