-
Notifications
You must be signed in to change notification settings - Fork 55
Description
This is a design/discussion issue for the command line arguments and syntax for a jobspec-oriented run/submit interface.
The main idea is that there is a "slot shape", and a target entity for task scheduling. I'm not 100% sold on my own terminology here so please feel free to propose alternatives. Essentially, you get whether the task is per-slot or per-resource and which resource from the --target parameter, which defaults to slot. The number of tasks can either be per-target, or a total count, and the shape is specified with a restricted version of the original short-form jobspec I proposed. Here's a sketch of the interface:
flux run
--file: read jobspec from a file, TODO determine override behavior, for now mutually exclusive with all else
OR--target: either slot or a specific resource specified in the request, default slot--slot-shape: short-form resource shape, default:Node, current format thought<resource-type>[\[<min>\[:<max>\]\]]\[><resource>|,<resource at same level>], basically what we discussed long ago but limited in what can be specified for now, still set up to parse as yaml so you could also put actual yaml/json here if you were sufficiently motivated--shape-file: read shape as a resource-set from a file--nslots: number of slots to request, default: 1, also accepts a range to populate count--tasks-per-target: number of tasks to run per target, either slot or resource, default 1--total-tasks: total number of tasks to run in some arrangement across resources, mutually exclusive with--tasks-per-target--timewalltime, using flux duration
use-cases, drawn from rfc14:
- Request 4 nodes:
flux run --nslots 4 - Request between 3 and 30 nodes:
flux run --nslots 3:30 - Request 4 tasks(sic. was nodes, but that would be the same as the following) with at least 2 sockets each, and 4 cores per socket: (not planning to support sockets yet, but)
flux run --nslots 4 --shape socket[2]>core[4] - Request an exclusive allocation of 4 nodes that have at least two sockets and 4 cores per socket:
flux run --nslots 4 --shape node>socket[2]>core[4]
Skipping the complex examples as we don't plan to support them yet, and for now the recommended mechanism would be writing the jobspec.
use-case set 2:
- Run hostname 20 times on 4 nodes, 5 per node
flux run --nslots 4 --total-tasks 20 hostnameflux run --nslots 4 --tasks-per-slot 5 hostnameflux run --slot-shape node[4] --tasks-per-resource node:5 hostname
- Run 5 copies of hostname across 4 nodes, default distribution:
flux run --nslots 4 --total-tasks 5 hostname - Run 10 copies of myapp, require 2 cores per copy, for a total of 20 cores:
flux run --nslots 10 --shape core[2] myapp - Multiple binaries is not necessarily on tap yet, but I'm thinking of allowing you to have multiple of these on the same command line with a separator, probably get to the same place.
- Run 10 copies of app across 10 cores with at least 2GB per core:
flux run --shape (core,memory[2g]) app(possibly amounts we may need to revisit) - Run 10 copies of app across 2 nodes with at least 4GB per node:
flux run --shape node>memory[4g] --total-tasks 10 app
One possible issue here is that several of our use-cases require the slot to be outside the node for them to be easily expressible. Opening another issue for discussion of jobspec-V1 and ordering shortly.