Skip to content

Commit f38b082

Browse files
author
Toby Hodges
committed
added nested workflow episode
1 parent 1c6d23d commit f38b082

File tree

1 file changed

+123
-0
lines changed

1 file changed

+123
-0
lines changed

_episodes/17-nested-workflows.md

Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
---
2+
title: "Nested Workflows"
3+
teaching: 10
4+
exercises: 0
5+
questions:
6+
- "How do I connect multiple workflows together?"
7+
objectives:
8+
- "Learn how to construct nested workflows from multiple CWL workflow
9+
descriptions."
10+
keypoints:
11+
- "A workflow can be used as a step in another workflow, if the workflow engine
12+
supports the `SubworkflowFeatureRequirement`."
13+
- "The workflows are specified under `steps`, with the worklow's description
14+
file provided as the value to the `run` field."
15+
- "Use `default` to specify a default value for a field, which can be
16+
overwritten by a value in the input object."
17+
- "Use `>` to ignore newlines in long commands split over multiple lines."
18+
---
19+
Workflows are ways to combine multiple tools to perform a larger operations.
20+
We can also think of a workflow as being a tool itself; a CWL workflow can be
21+
used as a step in another CWL workflow, if the workflow engine supports the
22+
`SubworkflowFeatureRequirement`:
23+
24+
25+
```
26+
requirements:
27+
- class: SubworkflowFeatureRequirement
28+
```
29+
30+
Here's an example workflow that uses our `1st-workflow.cwl` as a nested
31+
workflow:
32+
33+
```
34+
{% include cwl/nestedworkflows.cwl %}
35+
```
36+
37+
A CWL `Workflow` can be used as a `step` just like a `CommandLineTool`, it's CWL
38+
file is included with `run`. The workflow inputs (`inp` and `ex`) and outputs
39+
(`classout`) then can be mapped to become the step's input/outputs.
40+
41+
```
42+
compile:
43+
run: 1st-workflow.cwl
44+
in:
45+
inp:
46+
source: create-tar/tar
47+
ex:
48+
default: "Hello.java"
49+
out: [classout]
50+
```
51+
52+
Our `1st-workflow.cwl` was parameterized with workflow inputs, so when running
53+
it we had to provide a job file to denote the tar file and `*.java` filename.
54+
This is generally best-practice, as it means it can be reused in multiple parent
55+
workflows, or even in multiple steps within the same workflow.
56+
57+
Here we use `default:` to hard-code `"Hello.java"` as the `ex` input, however
58+
our workflow also requires a tar file at `inp`, which we will prepare in the
59+
`create-tar` step. At this point it is probably a good idea to refactor
60+
`1st-workflow.cwl` to have more specific input/output names, as those also
61+
appear in its usage as a tool.
62+
63+
It is also possible to do a less generic approach and avoid external
64+
dependencies in the job file. So in this workflow we can generate a hard-coded
65+
`Hello.java` file using the previously mentioned `InitialWorkDirRequirement`
66+
requirement, before adding it to a tar file.
67+
68+
```
69+
create-tar:
70+
requirements:
71+
- class: InitialWorkDirRequirement
72+
listing:
73+
- entryname: Hello.java
74+
entry: |
75+
public class Hello {
76+
public static void main(String[] argv) {
77+
System.out.println("Hello from Java");
78+
}
79+
}
80+
```
81+
82+
In this case our step can assume `Hello.java` rather than be parameterized, so
83+
we can use a simpler `arguments` form as long as the CWL workflow engine
84+
supports the `ShellCommandRequirement`:
85+
86+
```
87+
run:
88+
class: CommandLineTool
89+
requirements:
90+
- class: ShellCommandRequirement
91+
arguments:
92+
- shellQuote: false
93+
valueFrom: >
94+
tar cf hello.tar Hello.java
95+
```
96+
97+
Note the use of `shellQuote: false` here, otherwise the shell will try to
98+
execute the quoted binary `"tar cf hello.tar Hello.java"`.
99+
100+
Here the `>` block means that newlines are stripped, so it's possible to write
101+
the single command on multiple lines. Similarly, the `|` we used above will
102+
preserve newlines, combined with `ShellCommandRequirement` this would allow
103+
embedding a shell script.
104+
Shell commands should however be used sparingly in CWL, as it means you
105+
"jump out" of the workflow and no longer get reusable components, provenance or
106+
scalability. For reproducibility and portability it is recommended to only use
107+
shell commands together with a `DockerRequirement` hint, so that the commands
108+
are executed in a predictable shell environment.
109+
110+
Did you notice that we didn't split out the `tar cf` tool to a separate file,
111+
but rather embedded it within the CWL Workflow file? This is generally not best
112+
practice, as the tool then can't be reused. The reason for doing it in this case
113+
is because the command line is hard-coded with filenames that only make sense
114+
within this workflow.
115+
116+
In this example we had to prepare a tar file outside, but only because our inner
117+
workflow was designed to take that as an input. A better refactoring of the
118+
inner workflow would be to take a list of Java files to compile, which would
119+
simplify its usage as a tool step in other workflows.
120+
121+
Nested workflows can be a powerful feature to generate higher-level functional
122+
and reusable workflow units - but just like for creating a CWL Tool description,
123+
care must be taken to improve its usability in multiple workflows.

0 commit comments

Comments
 (0)