Skip to content

Commit 5f55d3c

Browse files
authored
Merge pull request #6803 from garlick/issue#5784
doc: add workflows page
2 parents 296a2d8 + c1e4513 commit 5f55d3c

File tree

5 files changed

+82
-0
lines changed

5 files changed

+82
-0
lines changed

doc/Makefile.am

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -478,6 +478,7 @@ EXTRA_DIST = \
478478
guide/kvs.rst \
479479
guide/broker.rst \
480480
guide/troubleshooting.rst \
481+
guide/workflows.rst \
481482
$(RST_FILES) \
482483
man1/index.rst \
483484
man1/common/resources.rst \

doc/guide/glossary.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -128,6 +128,14 @@ used in our documentation that may not be familiar to all readers.
128128
The range is 0 to 31, with a default value of 16. Urgency is defined
129129
by :doc:`rfc:spec_30`.
130130

131+
workflow
132+
A set of related jobs that are orchestrated to accomplish a goal.
133+
In Flux, orchestration naturally maps to the :term:`initial program` of
134+
a Flux instance. An example of a simple workflow is a batch job whose
135+
batch script submits a set of inter-dependent jobs then waits for them to
136+
complete.
137+
138+
131139
Footnotes
132140
---------
133141

doc/guide/workflows.rst

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
2+
.. _workflows:
3+
4+
#########
5+
Workflows
6+
#########
7+
8+
9+
Flux is designed for scientific :term:`workflows <workflow>`.
10+
A workflow naturally maps to a Flux instance, with orchestration of its jobs
11+
by its :term:`initial program`. Since a Flux instance is also a job, Flux
12+
workflows are highly composable. Because Flux runs as a parallel job under
13+
other resource managers such as Slurm, Flux workflows can be pleasantly
14+
portable compared to other approaches.
15+
16+
Workflow orchestration can be a simple batch script that runs jobs in
17+
sequence or with simple job dependencies, or it can be a more sophisticated
18+
workflow application like
19+
`Maestro <https://maestrowf.readthedocs.io/en/latest/Maestro/index.html>`_.
20+
21+
A workflow orchestrator can make full use of Flux's distributed services
22+
such as its key-value store.
23+
24+
**************
25+
KVS Guidelines
26+
**************
27+
28+
The Flux KVS provides a distributed, *eventually consistent*, persistent
29+
data store. It supports atomic commits and synchronization via messages.
30+
The :doc:`kvs` design document describes it in more detail.
31+
32+
The KVS is highly scalable for some use cases, such as sharing data with many
33+
processes in a large parallel job. However, it is not always the preferred
34+
way to store workflow data. The following guidelines may be helpful for
35+
understanding how to use it effectively.
36+
37+
.. note::
38+
39+
This section is under construction and is currently pretty sparse!
40+
For now, please open an issue or discussion in the flux-core github
41+
repo if you have specific KVS questions or would like to discuss your
42+
workflow storage and synchronization requirements.
43+
44+
- The default location for the KVS backing store is ``/tmp`` on the first
45+
node of the Flux instance's allocation. On some systems, this may be
46+
a ramdisk with limited space. See the description of ``statedir`` in
47+
:man7:`flux-broker-attributes` for info on redirecting the KVS backing
48+
store to another location.
49+
50+
- By default a batch job's KVS content is cleaned up when the instance
51+
terminates. Use the :option:`flux batch --dump` option to preserve KVS
52+
content.
53+
54+
- Job data is stored in the KVS under the ``job`` directory. The jobs
55+
themselves may store data in the KVS directory assigned to their job.
56+
This convention is described in :doc:`rfc:spec_16`.
57+
58+
- The KVS backing storage requirements go way up if there is significant
59+
churn in content, since every change percolates to the root of the hash
60+
tree and all versions of the tree are preserved. Rewriting a key many
61+
times during execution may be considered an anti-pattern.
62+
63+
- Keys with huge values can cause head of line blocking in the broker,
64+
where the transfer of a large KVS message through a shared channel
65+
delays other messages. A parallel file system is a better place to
66+
store big data.
67+
68+
- :man1:`flux-archive` may be helpful in some use cases, especially where
69+
the data becomes input to subsequent jobs because then the `stage-in`
70+
job shell plugin can be used.

doc/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,7 @@ Table of Contents
101101
guide/support
102102
guide/start
103103
guide/interact
104+
guide/workflows
104105
guide/admin
105106
guide/troubleshooting
106107
guide/glossary

doc/test/spell.en.pws

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -966,3 +966,5 @@ Tplu
966966
tuolumne
967967
UWjZ
968968
YQ
969+
composable
970+
orchestrator

0 commit comments

Comments
 (0)