|
| 1 | +--- |
| 2 | +title: "Nested Workflows" |
| 3 | +teaching: 10 |
| 4 | +exercises: 0 |
| 5 | +questions: |
| 6 | +- "How do I connect multiple workflows together?" |
| 7 | +objectives: |
| 8 | +- "Learn how to construct nested workflows from multiple CWL workflow |
| 9 | +descriptions." |
| 10 | +keypoints: |
| 11 | +- "A workflow can be used as a step in another workflow, if the workflow engine |
| 12 | +supports the `SubworkflowFeatureRequirement`." |
| 13 | +- "The workflows are specified under `steps`, with the worklow's description |
| 14 | +file provided as the value to the `run` field." |
| 15 | +- "Use `default` to specify a default value for a field, which can be |
| 16 | +overwritten by a value in the input object." |
| 17 | +- "Use `>` to ignore newlines in long commands split over multiple lines." |
| 18 | +--- |
| 19 | +Workflows are ways to combine multiple tools to perform a larger operations. |
| 20 | +We can also think of a workflow as being a tool itself; a CWL workflow can be |
| 21 | +used as a step in another CWL workflow, if the workflow engine supports the |
| 22 | +`SubworkflowFeatureRequirement`: |
| 23 | + |
| 24 | + |
| 25 | +``` |
| 26 | +requirements: |
| 27 | + - class: SubworkflowFeatureRequirement |
| 28 | +``` |
| 29 | + |
| 30 | +Here's an example workflow that uses our `1st-workflow.cwl` as a nested |
| 31 | +workflow: |
| 32 | + |
| 33 | +``` |
| 34 | +{% include cwl/nestedworkflows.cwl %} |
| 35 | +``` |
| 36 | + |
| 37 | +A CWL `Workflow` can be used as a `step` just like a `CommandLineTool`, it's CWL |
| 38 | +file is included with `run`. The workflow inputs (`inp` and `ex`) and outputs |
| 39 | +(`classout`) then can be mapped to become the step's input/outputs. |
| 40 | + |
| 41 | +``` |
| 42 | + compile: |
| 43 | + run: 1st-workflow.cwl |
| 44 | + in: |
| 45 | + inp: |
| 46 | + source: create-tar/tar |
| 47 | + ex: |
| 48 | + default: "Hello.java" |
| 49 | + out: [classout] |
| 50 | +``` |
| 51 | + |
| 52 | +Our `1st-workflow.cwl` was parameterized with workflow inputs, so when running |
| 53 | +it we had to provide a job file to denote the tar file and `*.java` filename. |
| 54 | +This is generally best-practice, as it means it can be reused in multiple parent |
| 55 | +workflows, or even in multiple steps within the same workflow. |
| 56 | + |
| 57 | +Here we use `default:` to hard-code `"Hello.java"` as the `ex` input, however |
| 58 | +our workflow also requires a tar file at `inp`, which we will prepare in the |
| 59 | +`create-tar` step. At this point it is probably a good idea to refactor |
| 60 | +`1st-workflow.cwl` to have more specific input/output names, as those also |
| 61 | +appear in its usage as a tool. |
| 62 | + |
| 63 | +It is also possible to do a less generic approach and avoid external |
| 64 | +dependencies in the job file. So in this workflow we can generate a hard-coded |
| 65 | +`Hello.java` file using the previously mentioned `InitialWorkDirRequirement` |
| 66 | +requirement, before adding it to a tar file. |
| 67 | + |
| 68 | +``` |
| 69 | + create-tar: |
| 70 | + requirements: |
| 71 | + - class: InitialWorkDirRequirement |
| 72 | + listing: |
| 73 | + - entryname: Hello.java |
| 74 | + entry: | |
| 75 | + public class Hello { |
| 76 | + public static void main(String[] argv) { |
| 77 | + System.out.println("Hello from Java"); |
| 78 | + } |
| 79 | + } |
| 80 | +``` |
| 81 | + |
| 82 | +In this case our step can assume `Hello.java` rather than be parameterized, so |
| 83 | +we can use a simpler `arguments` form as long as the CWL workflow engine |
| 84 | +supports the `ShellCommandRequirement`: |
| 85 | + |
| 86 | +``` |
| 87 | + run: |
| 88 | + class: CommandLineTool |
| 89 | + requirements: |
| 90 | + - class: ShellCommandRequirement |
| 91 | + arguments: |
| 92 | + - shellQuote: false |
| 93 | + valueFrom: > |
| 94 | + tar cf hello.tar Hello.java |
| 95 | +``` |
| 96 | + |
| 97 | +Note the use of `shellQuote: false` here, otherwise the shell will try to |
| 98 | +execute the quoted binary `"tar cf hello.tar Hello.java"`. |
| 99 | + |
| 100 | +Here the `>` block means that newlines are stripped, so it's possible to write |
| 101 | +the single command on multiple lines. Similarly, the `|` we used above will |
| 102 | +preserve newlines, combined with `ShellCommandRequirement` this would allow |
| 103 | +embedding a shell script. |
| 104 | +Shell commands should however be used sparingly in CWL, as it means you |
| 105 | +"jump out" of the workflow and no longer get reusable components, provenance or |
| 106 | +scalability. For reproducibility and portability it is recommended to only use |
| 107 | +shell commands together with a `DockerRequirement` hint, so that the commands |
| 108 | +are executed in a predictable shell environment. |
| 109 | + |
| 110 | +Did you notice that we didn't split out the `tar cf` tool to a separate file, |
| 111 | +but rather embedded it within the CWL Workflow file? This is generally not best |
| 112 | +practice, as the tool then can't be reused. The reason for doing it in this case |
| 113 | +is because the command line is hard-coded with filenames that only make sense |
| 114 | +within this workflow. |
| 115 | + |
| 116 | +In this example we had to prepare a tar file outside, but only because our inner |
| 117 | +workflow was designed to take that as an input. A better refactoring of the |
| 118 | +inner workflow would be to take a list of Java files to compile, which would |
| 119 | +simplify its usage as a tool step in other workflows. |
| 120 | + |
| 121 | +Nested workflows can be a powerful feature to generate higher-level functional |
| 122 | +and reusable workflow units - but just like for creating a CWL Tool description, |
| 123 | +care must be taken to improve its usability in multiple workflows. |
0 commit comments