Commit a5707c7
authored
Added markdown of the CLI help messages. (#51)
* Added markdown of the CLI help messages.
* Fix some bugs in the examples.
* Fixed a bug where the number of GPUs required was returned as a float
rather than an int.
* Fixed how banks and accounts are specified.
* Fixed a bug in how the check for None vs an empty string in the
configuration of the launch directory argument.
Updated some of the test configuration examples.
* Improved how the system architecture (especially if overridden) is reported.
* Added guards for setting out or error log files for ephemeral jobs.
* Added function to scheduler class to get the environment variable for
each rank's ID. Updated the launch script so that it gets the RANK
environment variable so that it can write out the hostlist if
necessary. Improved the guards for ephemeral job CLI flags.
* Removed debugging code
* Cleaned up and improved integration with the torchrun-hpc CLI argument
to set the max memory size and the CLI parameter list. Fixed a bug in
how the system parameters are mutated from the CLI.
* Updated env variable.
* Added a default argument for the max gpu mem.
* Fixed how slurm runs check for the root node in a torch run.
* Finished cleaning up the torchrun-hpc CLI examples.
* Fixed tests to use a launch directory.
* Minor cleanup1 parent 2d93ff5 commit a5707c7
File tree
18 files changed
+890
-31
lines changed- hpc_launcher
- cli
- schedulers
- systems
- torch
- tests
18 files changed
+890
-31
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
33 | 33 | | |
34 | 34 | | |
35 | 35 | | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
36 | 41 | | |
37 | 42 | | |
38 | 43 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
358 | 358 | | |
359 | 359 | | |
360 | 360 | | |
361 | | - | |
362 | | - | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
363 | 373 | | |
364 | 374 | | |
365 | 375 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
81 | 81 | | |
82 | 82 | | |
83 | 83 | | |
84 | | - | |
| 84 | + | |
85 | 85 | | |
86 | 86 | | |
87 | 87 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
75 | 75 | | |
76 | 76 | | |
77 | 77 | | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
78 | 83 | | |
79 | 84 | | |
80 | 85 | | |
| |||
104 | 109 | | |
105 | 110 | | |
106 | 111 | | |
107 | | - | |
108 | | - | |
109 | | - | |
110 | | - | |
111 | | - | |
112 | | - | |
113 | | - | |
114 | | - | |
115 | | - | |
116 | | - | |
117 | | - | |
118 | 112 | | |
119 | 113 | | |
120 | 114 | | |
| |||
176 | 170 | | |
177 | 171 | | |
178 | 172 | | |
179 | | - | |
| 173 | + | |
180 | 174 | | |
181 | 175 | | |
182 | 176 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
107 | 107 | | |
108 | 108 | | |
109 | 109 | | |
110 | | - | |
| 110 | + | |
111 | 111 | | |
112 | 112 | | |
113 | 113 | | |
| |||
158 | 158 | | |
159 | 159 | | |
160 | 160 | | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
161 | 165 | | |
162 | 166 | | |
163 | 167 | | |
| |||
185 | 189 | | |
186 | 190 | | |
187 | 191 | | |
188 | | - | |
| 192 | + | |
189 | 193 | | |
190 | 194 | | |
191 | 195 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
141 | 141 | | |
142 | 142 | | |
143 | 143 | | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
144 | 148 | | |
145 | 149 | | |
146 | 150 | | |
| |||
167 | 171 | | |
168 | 172 | | |
169 | 173 | | |
170 | | - | |
| 174 | + | |
171 | 175 | | |
172 | 176 | | |
173 | 177 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
333 | 333 | | |
334 | 334 | | |
335 | 335 | | |
| 336 | + | |
336 | 337 | | |
337 | 338 | | |
338 | 339 | | |
339 | 340 | | |
340 | 341 | | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
341 | 347 | | |
342 | 348 | | |
343 | 349 | | |
| |||
386 | 392 | | |
387 | 393 | | |
388 | 394 | | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
389 | 404 | | |
390 | 405 | | |
391 | 406 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
168 | 168 | | |
169 | 169 | | |
170 | 170 | | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
171 | 175 | | |
172 | 176 | | |
173 | 177 | | |
| |||
192 | 196 | | |
193 | 197 | | |
194 | 198 | | |
195 | | - | |
| 199 | + | |
196 | 200 | | |
197 | 201 | | |
198 | 202 | | |
199 | 203 | | |
200 | | - | |
| 204 | + | |
201 | 205 | | |
202 | 206 | | |
203 | 207 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
60 | 60 | | |
61 | 61 | | |
62 | 62 | | |
| 63 | + | |
63 | 64 | | |
64 | 65 | | |
65 | 66 | | |
| |||
89 | 90 | | |
90 | 91 | | |
91 | 92 | | |
| 93 | + | |
92 | 94 | | |
93 | 95 | | |
94 | 96 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
33 | | - | |
| 33 | + | |
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
| |||
63 | 63 | | |
64 | 64 | | |
65 | 65 | | |
| 66 | + | |
66 | 67 | | |
| 68 | + | |
67 | 69 | | |
68 | | - | |
69 | | - | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
70 | 73 | | |
71 | 74 | | |
72 | | - | |
73 | | - | |
74 | | - | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
75 | 84 | | |
76 | 85 | | |
77 | 86 | | |
| |||
0 commit comments