Added torchrun compatibility for distributet training across multiple GPUs in a single node (single instance) #1565
Job | Run time |
---|---|
2s | |
2s | |
29m 35s | |
29m 20s | |
28m 29s | |
37m 53s | |
1h 21m 4s | |
18m 49s | |
3h 45m 14s |
Job | Run time |
---|---|
2s | |
2s | |
29m 35s | |
29m 20s | |
28m 29s | |
37m 53s | |
1h 21m 4s | |
18m 49s | |
3h 45m 14s |