|
| 1 | +# Best Practices |
| 2 | + |
| 3 | +## Inline vs Runner Scripts |
| 4 | + |
| 5 | +[Runner scripts](script-constructors.md#runner-scripts) have a stronger feature set than inline scripts so are the |
| 6 | +recommended path. You will likely need to set up a CICD solution as the iterative development process can be cumbersome. |
| 7 | +Read more in [Iterating on a Workflow](#iterating-on-a-workflow) and [CICD](#cicd). |
| 8 | + |
| 9 | +## How To Write Workflows |
| 10 | + |
| 11 | +### Laying Out Your Python Code |
| 12 | + |
| 13 | +Hera aims to keep orchestration code outside of your business logic, meaning you should write business logic in |
| 14 | +functions decorated with `@script`, and construct your Workflow separately, as seen throughout the examples. This might |
| 15 | +involve putting code in different submodules in your application, but overall you are free to write your Workflow how |
| 16 | +you want. |
| 17 | + |
| 18 | +An example directory structure for a project using `poetry` might be: |
| 19 | + |
| 20 | +```console |
| 21 | +. |
| 22 | +├── Dockerfile |
| 23 | +├── hera_scratch |
| 24 | +│ ├── __main__.py |
| 25 | +│ └── workflow.py |
| 26 | +├── Makefile |
| 27 | +├── poetry.lock |
| 28 | +├── pyproject.toml |
| 29 | +├── README.md |
| 30 | +└── requirements.txt |
| 31 | +``` |
| 32 | + |
| 33 | +In this project, we use: |
| 34 | + |
| 35 | +* a `Dockerfile` to create an image for the runner script, using the packages in `requirements.txt` and a command |
| 36 | + contained in the `Makefile` |
| 37 | +* a `__main__.py` file to run the Workflow from `workflow.py` on Argo Workflows |
| 38 | +* standard poetry/Python files |
| 39 | + |
| 40 | +See this layout in the [example Hera project repo](https://github.com/elliotgunton/hera-example-project)! |
| 41 | + |
| 42 | +### Steps vs DAGs |
| 43 | + |
| 44 | +* Use Steps for simple, sequential processing |
| 45 | +* Use DAGs to run as many tasks in parallel as possible, and if you want to only describe dependencies, not ordering |
| 46 | + |
| 47 | +### Iterating on a Workflow |
| 48 | + |
| 49 | +The developer process of iterating on a Workflow can be cumbersome as Argo does not offer a way to "dry run" your |
| 50 | +Workflow, and developers often get caught out by simple errors (such as unresolved variables). |
| 51 | + |
| 52 | +Hera should help reduce the number of iterations you need on the Workflow itself, but first you should create your |
| 53 | +business logic in isolation, and test it as normal Python (if you are using Python). The Workflow should then be written |
| 54 | +with the intention to test the plumbing between steps on a live Argo Workflows installation. You should have easily |
| 55 | +"mockable" business logic, in the sense that if your Workflow will process millions of rows of data, try running the |
| 56 | +Workflow with just a few rows at first. You can then iteratively build the Workflow from start to finish, testing the |
| 57 | +Workflow regularly on the live cluster. |
| 58 | + |
| 59 | +To summarise: |
| 60 | + |
| 61 | +1. Write business logic (and tests!) |
| 62 | +1. Add script decorators |
| 63 | +1. Write your Workflow |
| 64 | +1. Test as you go! |
| 65 | + |
| 66 | +## Workflows vs WorkflowTemplates vs ClusterWorkflowTemplates |
| 67 | + |
| 68 | +WorkflowTemplates are intended to be collections of templates that live in your Kubernetes namespace and used by other |
| 69 | +Workflows. ClusterWorkflowTemplates are the same but are accessible from all Kubernetes namespaces, so should perform |
| 70 | +common actions like email or Slack alerts, as they will be accessible by teams across your organisation. For brevity |
| 71 | +throughout the docs, when we refer to "WorkflowTemplates" we are also referring to "ClusterWorkflowTemplates". |
| 72 | + |
| 73 | +In Hera, you will usually be writing Workflows, unless you find a common pattern or template usage, which can then be |
| 74 | +extracted into a WorkflowTemplate. |
| 75 | + |
| 76 | +### Distributing Python Libraries vs WorkflowTemplates |
| 77 | + |
| 78 | +In Hera, WorkflowTemplates may seem redundant when we can build and distribute Python libaries of script-decorated |
| 79 | +functions in versioned packages. Therefore there is not much difference to a Python end user between a versioned |
| 80 | +WorkflowTemplates approach, and using plain old Python packages. The table below summarises the differences and final |
| 81 | +recommendation: |
| 82 | + |
| 83 | +| | WorkflowTemplates | Script Functions only (Python Packages) | |
| 84 | +| --- | --- | --- | |
| 85 | +| End User Usage | End users must use `TemplateRef`. | End users import the function and use it like a normal script. | |
| 86 | +| Inputs / Outputs (Documentation) | Must be documented separately. Alternatively, viewable directly in the YAML (on cluster or GitHub). | Viewable in IDE from function code. | |
| 87 | +| Versioning | Authors must name WorkflowTemplates with version numbers. Not supported natively in Argo Workflows. | New versions released through common Python versioning/release tools. Read more in [Versioning](#versioning) below. | |
| 88 | +| Distribution | Can use GitOps tools (e.g. [Argo CD](https://argo-cd.readthedocs.io/en/stable/)). Requires a custom mechanism to notify users of, and update to a new WorkflowTemplate version in their (YAML) Workflow code. | Users can upgrade to new versions common dependency auto-updaters, or manually through `pip` or `poetry`, then build a new image using the new version. Read more in [CICD](#cicd) below. | |
| 89 | +| Caveats | Argo Workflows is lacking in native versioning features, so you may need to build custom solutions. | Non-Python users cannot use your script functions, so you may also need to release WorkflowTemplates (as YAML), which requires more maintenance. | |
| 90 | +| Recommendation | Use for small, common pieces of functionality. Use existing tools where possible for versioning and distribution. | Not generally recommended – use script functions (from Python packages) if your organisation _only_ uses Python (and you don't expect that to change). | |
| 91 | + |
| 92 | +### Versioning |
| 93 | + |
| 94 | +As Workflows and WorkflowTemplates are Custom Resource Definitions (CRDs) on Kubernetes, they can be updated in-place. |
| 95 | +This means if you have changed a template within a WorkflowTemplate, and `apply` it on the cluster, you will change the |
| 96 | +template for anyone who uses it in future but was expecting the previous version. |
| 97 | + |
| 98 | +Explicitly versioning WorkflowTemplates can avoid this issue, but is not natively supported in Argo Workflows. We can |
| 99 | +create a reasonable workaround in Hera by leveraging Python package versioning. |
| 100 | + |
| 101 | +We can simply include the Python package's version in the WorkflowTemplate name, which keeps the Python version and |
| 102 | +WorkflowTemplate version in sync: |
| 103 | + |
| 104 | +```python |
| 105 | +import my_package |
| 106 | + |
| 107 | +VERSION = my_package.__version__ |
| 108 | +global_config.image = f"my-package:v{VERSION}" |
| 109 | + |
| 110 | +with WorkflowTemplate(name=f"my-package-wt-v{VERSION}") as w: |
| 111 | + ... |
| 112 | +``` |
| 113 | + |
| 114 | +This also allows you to create "pre-releases" of WorkflowTemplates, letting you test them out privately before releasing |
| 115 | +them (as ClusterWorkflowTemplates). |
| 116 | + |
| 117 | +### CICD |
| 118 | + |
| 119 | +Following on from [Versioning](#versioning), we will need good Continuous Integration (CI) to test and Continuous |
| 120 | +Deployment (CD) to deploy these versioned WorkflowTemplates. |
| 121 | + |
| 122 | +### End-to-End Workflow Testing |
| 123 | + |
| 124 | +For an end-to-end test in our CI, we'll need to build the Python image if using runner scripts, ensure all the Script |
| 125 | +Templates use the new Python image, and then run the WorkflowTemplate as a Workflow. This can be achieved using an |
| 126 | +environment variable from the CI tool, for example, we can run the following couple of lines from a shell: |
| 127 | + |
| 128 | +```console |
| 129 | +IMAGE_NAME=my-package-image-test |
| 130 | +python -m my_package.run_test_workflow |
| 131 | +``` |
| 132 | + |
| 133 | +Then, if we assume the WorkflowTemplate object can be imported from `my_package`, then the `run_test_workflow.py` file |
| 134 | +might look like: |
| 135 | + |
| 136 | +```python |
| 137 | +VERSION = my_package.__version__ |
| 138 | +global_config.image = os.environ .get("IMAGE_NAME", f"my-package:v{VERSION}") |
| 139 | + |
| 140 | +from my_package import workflow_template |
| 141 | + |
| 142 | +workflow_template.create_as_workflow(generate_name="my-package-wt-test") |
| 143 | +``` |
| 144 | + |
| 145 | +### WorkflowTemplate Deployment |
| 146 | + |
| 147 | +If you don't have a dedicated GitOps CD tool like [Argo CD](https://argo-cd.readthedocs.io/en/stable/) (which is |
| 148 | +recommended), your CI can run a deployment step. |
| 149 | + |
| 150 | +For example, the following could be in a `deploy_workflow_template.py` file which runs in CI: |
| 151 | + |
| 152 | +```python |
| 153 | +import my_package |
| 154 | + |
| 155 | +VERSION = my_package.__version__ |
| 156 | +global_config.image = f"my-package:v{VERSION}" |
| 157 | + |
| 158 | +with WorkflowTemplate(name=f"my-package-wt-v{VERSION}") as w: |
| 159 | + ... |
| 160 | + |
| 161 | +w.create() |
| 162 | +``` |
0 commit comments