-
Notifications
You must be signed in to change notification settings - Fork 15
Open
Labels
Good IssueGood reference for newcomersGood reference for newcomers
Description
I have noticed that the group_size is set to world_size in examples, but in fact the group_size can be set to other numbers according to my understanding.
https://github.com/KaiyuYue/torchshard/blob/main/torchshard/distributed/core.py#L18
I have also found that the get_world_size() will return the number of all processes.
The two findings make me confused in a multi-node setting, say 2 nodes with each node with 2 processes.
If the group_size is 2, then there are 2 distinct groups besides the default group (w/ overlap). However, get_world_size() is used without specifying a group can make a layer be splitted to 4 parts, which is expected to be 2 in our case.
Correct me if I am wrong.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Good IssueGood reference for newcomersGood reference for newcomers