Commit bd9bb36
Allow ports to be reused in gloo (pytorch#97677)
Summary:
Pull Request resolved: pytorch#97677
X-link: pytorch/gloo#353
ProcessGroupGloo and gloo seem to be opening and closing sockets without allowing the port to be reused. We see this issue pop up in larger training jobs "Address already in use" and we assume it to be because all the ephemeral ports are exhausted.
This diff allows ports to be reused, we see a reduced number of ports being in `TIME_WAIT` state.
context: https://fb.workplace.com/groups/319878845696681/permalink/5988899781205532/
another issue: https://fb.workplace.com/groups/319878845696681/permalink/958768178474408/
Test Plan: Add a gloo test to create 4 groups of size 64 using multithreaded PG + gloo. In total 256 ranks.
Differential Revision: D44029927
fbshipit-source-id: 9c31c38485333602c33e12c12813bea33ccb94381 parent 97fc8ea commit bd9bb36
File tree
2 files changed
+47
-0
lines changed- test/distributed
- torch/csrc/distributed/c10d
2 files changed
+47
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
220 | 220 | | |
221 | 221 | | |
222 | 222 | | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
223 | 252 | | |
224 | 253 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
638 | 638 | | |
639 | 639 | | |
640 | 640 | | |
| 641 | + | |
| 642 | + | |
| 643 | + | |
| 644 | + | |
| 645 | + | |
| 646 | + | |
| 647 | + | |
| 648 | + | |
| 649 | + | |
| 650 | + | |
| 651 | + | |
| 652 | + | |
| 653 | + | |
| 654 | + | |
| 655 | + | |
| 656 | + | |
| 657 | + | |
| 658 | + | |
641 | 659 | | |
642 | 660 | | |
643 | 661 | | |
| |||
0 commit comments