Skip to content

Conversation

@skpig
Copy link
Collaborator

@skpig skpig commented Dec 9, 2021

  1. Add Multi_Node_Training in image_classification.
  2. Write a tutorial about multi-node training in README.md
  3. Use ViT model as an example.

@skpig skpig added the enhancement New feature or request label Dec 9, 2021
@xperzy
Copy link
Collaborator

xperzy commented Dec 10, 2021

BTW, in the main_multi_gpu.py, please also check the logging/model saving scheme such as

Current code does not consider the case where world_size > 1.

@skpig
Copy link
Collaborator Author

skpig commented Dec 10, 2021

I just check the details of spawn() function. For example, we have 2 hosts with 2 processes running on each host. Then the
local_rank = dist.get_rank() will return 0, 1, 2, 3 respectively. I guess the original code works fine?

@xperzy xperzy self-requested a review December 13, 2021 03:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants