hyp-pytorch-job does not correctly use --accelerators/--accelerators-limits to specify resource requests and limits

For training jobs, if you specify -`-accelerators/--accelerators-limits` and not `--instance-type` , it will request
```
          resources:
            limits:
              nvidia.com/gpu: '0'
            requests:
              nvidia.com/gpu: '0'
```
independently of how many accelerators you request. If you specify `--instance-type `additionally, it will use the correct value from `--accelerators/--accelerators-limits `for some reason. But: you cannot use `--accelerators/--accelerators-limits` and `--node-count `at the same time, it will give a validation error.
Thus: You cannot submit multi-node jobs with manually specified resources.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

hyp-pytorch-job does not correctly use --accelerators/--accelerators-limits to specify resource requests and limits #317

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

hyp-pytorch-job does not correctly use --accelerators/--accelerators-limits to specify resource requests and limits #317

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions