@@ -157,6 +157,63 @@ spec:
157157
158158**Note that we have support for a custom application container, but haven't written any good examples yet!**
159159
160+ # # Workload
161+
162+ # ## workload-flux
163+
164+ If you need to "throw in" Flux Framework into your container to use as a scheduler, you can do that with an addon!
165+
166+ > Yes, it's astounding. 🦩️
167+
168+ This works by way of the same trick that we use for other addons that have a complex (and/or large) install setup. We :
169+
170+ - Build the software into an isolated spack "copy" view
171+ - The software is then (generally) at some `/opt/view` and `/opt/software`
172+ - The flux container is added as a sidecar container to your pod for your replicated job
173+ - Additional setup / configuration is done here
174+ - We can then create an empty volume that is shared by your metric or scaled application
175+ - The entire tree is copied over into the empty volume
176+ - When the copy is done, indicated by the final touch of a file, the updated container entrypoint is run
177+ - This typically means we have taken your metric command, and wrapped it in a Flux submit.
178+
179+ It's really cool because it means you can run a metric / application with Flux without needing
180+ to install it into your container to begin with. The one important detail is a matching of
181+ general operating system. The current view uses rocky, however the image is customizable
182+ (and we can provide other bases if/when requested). Here are the arguments you can customize
183+ under the metric -> options.
184+
185+ | Name | Description | Type | Default |
186+ |-----|-------------|------------|------|
187+ | mount | Path to mount flux view in application container | string | /opt/share |
188+ | tasks | Number of tasks `-n` to give to flux (not provided if not set) | string | unset |
189+ | image | Customize the container image | string | `ghcr.io/rse-ops/spack-flux-rocky-view:tag-8` |
190+ | fluxUser | The flux user (currently not used, but TBA) | string | flux |
191+ | fluxUid | The flux user ID (currently not used, but TBA) | string | 1004 |
192+ | interactive | Run flux in interactive mode | string | "false" |
193+ | connectTimeout | How long zeroMQ should wait to retry | string | "5s" |
194+ | quorum | The number of brokers to require before starting the cluster | string | (total brokers or pods) |
195+ | debugZeroMQ | Turn on zeroMQ debugging | string | "false" |
196+ | logLevel | Customize the flux log level | string | "6" |
197+ | queuePolicy | Queue policy for flux to use | string | fcfs |
198+ | workerLetter | The letter that the worker job is expected to have | string | w |
199+ | launcherLetter | The letter that the launcher job is expected to have | string | w |
200+ | workerIndex | The index of the replicated job for the worker | string | 0 |
201+ | launcherIndex | The index of the replicated job for the launcher | string | 0 |
202+ | preCommand | Pre-command logic to run in launcher/workers before flux is started (after setup in flux container) | string | unset |
203+
204+ Note that the number of pods for flux defaults to the number in your MetricSet, along
205+ with the namespace and service name.
206+
207+ **Important** the flux addon is currently supported for metric types that:
208+
209+ 1. have the launcher / worker design (so the hostlist.txt is present in the PWD)
210+ 2. Have scp installed, as the shared certificate needs to be copied from the lead broker to all followers
211+ 3. Ideally have munge installed - we do try to install it (but better to already be there)
212+
213+ We also currently run flux as root. This is considered bad practice, but probably OK
214+ for this early development work. We don't see a need to have shared namespace / operator
215+ environments at this point, which is why I didn't add it.
216+
160217# # Performance
161218
162219# ## perf-hpctoolkit
0 commit comments