Skip to content

Commit f89b989

Browse files
authored
Update documentation for elastic training arguments (#343)
* Update documentation for elastic training arguments * nit: Add detail descriptions for array type
1 parent c64811d commit f89b989

File tree

2 files changed

+12
-0
lines changed

2 files changed

+12
-0
lines changed

README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -364,6 +364,12 @@ hyp create hyp-pytorch-job \
364364
| `--accelerator-partition-limit` | INTEGER | No | Limit for the number of accelerator partitions (minimum: 1) |
365365
| `--preferred-topology` | TEXT | No | Preferred topology annotation for scheduling |
366366
| `--required-topology` | TEXT | No | Required topology annotation for scheduling |
367+
| `--max-node-count` | INTEGER | No | Maximum number of nodes|
368+
| `--elastic-replica-increment-step` | INTEGER | No | Scaling step size for elastic training. Provide either this or elastic-replica-discrete-values|
369+
| `--elastic-graceful-shutdown-timeout-in-seconds` | INTEGER | No | Graceful shutdown timeout in seconds for elastic scaling operations|
370+
| `--elastic-scaling-timeout-in-seconds` | INTEGER | No | Scaling timeout for elastic training|
371+
| `--elastic-scale-up-snooze-time-in-seconds` | INTEGER | No | Timeout period after job restart during which no scale up/workload admission is allowed|
372+
| `--elastic-replica-discrete-values` | ARRAY | No | Alternative to elastic-replica-increment-step. Provides exact values for total replicas count (array of integers)|
367373
| `--debug` | FLAG | No | Enable debug mode (default: false) |
368374

369375
#### List Available Accelerator Partition Types

doc/cli/training/cli_training.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -206,6 +206,12 @@ hyp create hyp-pytorch-job [OPTIONS]
206206
| `--memory-limit` | FLOAT | No | Limit for the amount of memory in GiB |
207207
| `--preferred-topology` | TEXT | No | Preferred topology annotation for scheduling |
208208
| `--required-topology` | TEXT | No | Required topology annotation for scheduling |
209+
| `--max-node-count` | INTEGER | No | Maximum number of nodes|
210+
| `--elastic-replica-increment-step` | INTEGER | No | Scaling step size for elastic training. Provide either this or elastic-replica-discrete-values|
211+
| `--elastic-graceful-shutdown-timeout-in-seconds` | INTEGER | No | Graceful shutdown timeout in seconds for elastic scaling operations|
212+
| `--elastic-scaling-timeout-in-seconds` | INTEGER | No | Scaling timeout for elastic training|
213+
| `--elastic-scale-up-snooze-time-in-seconds` | INTEGER | No | Timeout period after job restart during which no scale up/workload admission is allowed|
214+
| `--elastic-replica-discrete-values` | ARRAY | No | Alternative to elastic-replica-increment-step. Provides exact values for total replicas count (array of integers)|
209215
| `--debug` | FLAG | No | Enable debug mode (default: false) |
210216

211217
### Volume Configuration

0 commit comments

Comments
 (0)