Skip to content

Commit 2dcf0df

Browse files
authored
Add Best Practices user guide (#1401)
**Pull Request Checklist** - [x] Fixes #1400 (but is in no way final) - [x] Documentation/examples added - [x] [Good commit messages](https://cbea.ms/git-commit/) and/or PR title **Description of PR** Add an initial set of best practices to create a new user guide. This isn't meant to be the "final" version as it should be more of a living document that we'll iterate on over time. As part of this I also created https://github.com/elliotgunton/hera-example-project which can be used to exemplify the best practices and design patterns, especially around Runner scripts. --------- Signed-off-by: Elliot Gunton <elliotgunton@gmail.com>
1 parent 89afef9 commit 2dcf0df

File tree

2 files changed

+163
-0
lines changed

2 files changed

+163
-0
lines changed

docs/user-guides/best-practices.md

Lines changed: 162 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,162 @@
1+
# Best Practices
2+
3+
## Inline vs Runner Scripts
4+
5+
[Runner scripts](script-constructors.md#runner-scripts) have a stronger feature set than inline scripts so are the
6+
recommended path. You will likely need to set up a CICD solution as the iterative development process can be cumbersome.
7+
Read more in [Iterating on a Workflow](#iterating-on-a-workflow) and [CICD](#cicd).
8+
9+
## How To Write Workflows
10+
11+
### Laying Out Your Python Code
12+
13+
Hera aims to keep orchestration code outside of your business logic, meaning you should write business logic in
14+
functions decorated with `@script`, and construct your Workflow separately, as seen throughout the examples. This might
15+
involve putting code in different submodules in your application, but overall you are free to write your Workflow how
16+
you want.
17+
18+
An example directory structure for a project using `poetry` might be:
19+
20+
```console
21+
.
22+
├── Dockerfile
23+
├── hera_scratch
24+
│ ├── __main__.py
25+
│ └── workflow.py
26+
├── Makefile
27+
├── poetry.lock
28+
├── pyproject.toml
29+
├── README.md
30+
└── requirements.txt
31+
```
32+
33+
In this project, we use:
34+
35+
* a `Dockerfile` to create an image for the runner script, using the packages in `requirements.txt` and a command
36+
contained in the `Makefile`
37+
* a `__main__.py` file to run the Workflow from `workflow.py` on Argo Workflows
38+
* standard poetry/Python files
39+
40+
See this layout in the [example Hera project repo](https://github.com/elliotgunton/hera-example-project)!
41+
42+
### Steps vs DAGs
43+
44+
* Use Steps for simple, sequential processing
45+
* Use DAGs to run as many tasks in parallel as possible, and if you want to only describe dependencies, not ordering
46+
47+
### Iterating on a Workflow
48+
49+
The developer process of iterating on a Workflow can be cumbersome as Argo does not offer a way to "dry run" your
50+
Workflow, and developers often get caught out by simple errors (such as unresolved variables).
51+
52+
Hera should help reduce the number of iterations you need on the Workflow itself, but first you should create your
53+
business logic in isolation, and test it as normal Python (if you are using Python). The Workflow should then be written
54+
with the intention to test the plumbing between steps on a live Argo Workflows installation. You should have easily
55+
"mockable" business logic, in the sense that if your Workflow will process millions of rows of data, try running the
56+
Workflow with just a few rows at first. You can then iteratively build the Workflow from start to finish, testing the
57+
Workflow regularly on the live cluster.
58+
59+
To summarise:
60+
61+
1. Write business logic (and tests!)
62+
1. Add script decorators
63+
1. Write your Workflow
64+
1. Test as you go!
65+
66+
## Workflows vs WorkflowTemplates vs ClusterWorkflowTemplates
67+
68+
WorkflowTemplates are intended to be collections of templates that live in your Kubernetes namespace and used by other
69+
Workflows. ClusterWorkflowTemplates are the same but are accessible from all Kubernetes namespaces, so should perform
70+
common actions like email or Slack alerts, as they will be accessible by teams across your organisation. For brevity
71+
throughout the docs, when we refer to "WorkflowTemplates" we are also referring to "ClusterWorkflowTemplates".
72+
73+
In Hera, you will usually be writing Workflows, unless you find a common pattern or template usage, which can then be
74+
extracted into a WorkflowTemplate.
75+
76+
### Distributing Python Libraries vs WorkflowTemplates
77+
78+
In Hera, WorkflowTemplates may seem redundant when we can build and distribute Python libaries of script-decorated
79+
functions in versioned packages. Therefore there is not much difference to a Python end user between a versioned
80+
WorkflowTemplates approach, and using plain old Python packages. The table below summarises the differences and final
81+
recommendation:
82+
83+
| | WorkflowTemplates | Script Functions only (Python Packages) |
84+
| --- | --- | --- |
85+
| End User Usage | End users must use `TemplateRef`. | End users import the function and use it like a normal script. |
86+
| Inputs / Outputs (Documentation) | Must be documented separately. Alternatively, viewable directly in the YAML (on cluster or GitHub). | Viewable in IDE from function code. |
87+
| Versioning | Authors must name WorkflowTemplates with version numbers. Not supported natively in Argo Workflows. | New versions released through common Python versioning/release tools. Read more in [Versioning](#versioning) below. |
88+
| Distribution | Can use GitOps tools (e.g. [Argo CD](https://argo-cd.readthedocs.io/en/stable/)). Requires a custom mechanism to notify users of, and update to a new WorkflowTemplate version in their (YAML) Workflow code. | Users can upgrade to new versions common dependency auto-updaters, or manually through `pip` or `poetry`, then build a new image using the new version. Read more in [CICD](#cicd) below. |
89+
| Caveats | Argo Workflows is lacking in native versioning features, so you may need to build custom solutions. | Non-Python users cannot use your script functions, so you may also need to release WorkflowTemplates (as YAML), which requires more maintenance. |
90+
| Recommendation | Use for small, common pieces of functionality. Use existing tools where possible for versioning and distribution. | Not generally recommended – use script functions (from Python packages) if your organisation _only_ uses Python (and you don't expect that to change). |
91+
92+
### Versioning
93+
94+
As Workflows and WorkflowTemplates are Custom Resource Definitions (CRDs) on Kubernetes, they can be updated in-place.
95+
This means if you have changed a template within a WorkflowTemplate, and `apply` it on the cluster, you will change the
96+
template for anyone who uses it in future but was expecting the previous version.
97+
98+
Explicitly versioning WorkflowTemplates can avoid this issue, but is not natively supported in Argo Workflows. We can
99+
create a reasonable workaround in Hera by leveraging Python package versioning.
100+
101+
We can simply include the Python package's version in the WorkflowTemplate name, which keeps the Python version and
102+
WorkflowTemplate version in sync:
103+
104+
```python
105+
import my_package
106+
107+
VERSION = my_package.__version__
108+
global_config.image = f"my-package:v{VERSION}"
109+
110+
with WorkflowTemplate(name=f"my-package-wt-v{VERSION}") as w:
111+
...
112+
```
113+
114+
This also allows you to create "pre-releases" of WorkflowTemplates, letting you test them out privately before releasing
115+
them (as ClusterWorkflowTemplates).
116+
117+
### CICD
118+
119+
Following on from [Versioning](#versioning), we will need good Continuous Integration (CI) to test and Continuous
120+
Deployment (CD) to deploy these versioned WorkflowTemplates.
121+
122+
### End-to-End Workflow Testing
123+
124+
For an end-to-end test in our CI, we'll need to build the Python image if using runner scripts, ensure all the Script
125+
Templates use the new Python image, and then run the WorkflowTemplate as a Workflow. This can be achieved using an
126+
environment variable from the CI tool, for example, we can run the following couple of lines from a shell:
127+
128+
```console
129+
IMAGE_NAME=my-package-image-test
130+
python -m my_package.run_test_workflow
131+
```
132+
133+
Then, if we assume the WorkflowTemplate object can be imported from `my_package`, then the `run_test_workflow.py` file
134+
might look like:
135+
136+
```python
137+
VERSION = my_package.__version__
138+
global_config.image = os.environ .get("IMAGE_NAME", f"my-package:v{VERSION}")
139+
140+
from my_package import workflow_template
141+
142+
workflow_template.create_as_workflow(generate_name="my-package-wt-test")
143+
```
144+
145+
### WorkflowTemplate Deployment
146+
147+
If you don't have a dedicated GitOps CD tool like [Argo CD](https://argo-cd.readthedocs.io/en/stable/) (which is
148+
recommended), your CI can run a deployment step.
149+
150+
For example, the following could be in a `deploy_workflow_template.py` file which runs in CI:
151+
152+
```python
153+
import my_package
154+
155+
VERSION = my_package.__version__
156+
global_config.image = f"my-package:v{VERSION}"
157+
158+
with WorkflowTemplate(name=f"my-package-wt-v{VERSION}") as w:
159+
...
160+
161+
w.create()
162+
```

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ nav:
3232
- Constructors: user-guides/script-constructors.md
3333
- Annotations: user-guides/script-annotations.md
3434
- Runner IO: user-guides/script-runner-io.md
35+
- Best Practices: user-guides/best-practices.md
3536
- Suspending Workflows: user-guides/suspending.md
3637
- Decorators: user-guides/decorators.md
3738
- Expr Transpiler: user-guides/expr.md

0 commit comments

Comments
 (0)